58
 Tera-Tom on Teradata Basics  by Morgan Jones and Tom Coffing ISBN:0970498012 Coffing Data Warehousing © 2001 (125 pages) Both management and IT will u nderstand this masterpiece written by the world's top authorities on Teradata and data warehousing, describing how Teradata is built to achieve data warehouse utopia. Table of Contents Introduction Teradata—The Shining Star Teradata Databases, Users and Space Data Protection Loading the Data Conclusion—A Final Thought on Teradata Tera-Tom on Teradata Basics—Teradata Explained Th rough Unimaginable Simplicity Tera-Tom on Teradata Basics—Teradata Explained Through Unimaginable Simplicity Tom Coffing Morgan Jones First Edition 2001 Web Page: http://www.Tera-Tom.com E-Mail addresses: Tom: [email protected] Teradata®, NCR™, and BYNET® are registered trademarks of NCR Corporation, Dayton, Ohio, U.S.A., IBM® and DB2® are registered trademarks of IBM Corporation, ORACLE® is a registered trademark of Oracle, SYBASE® is a registered trademark of SYBASE, ANSI® is a registered trademark of the American  National Standards Institute. In addition to these products names, all brands and product names in this document are registered names or trademarks of their respective holders. Coffing Data Warehousing shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book or from the use of programs or program segments that are included. The manua l is not a publication of NCR Corporation, nor was it produced in conjunction with NCR Corporation. Copyright © 2001 Coffing Publishing

143109593 Tom Coffing TD Basics

Embed Size (px)

DESCRIPTION

Teradata Basics

Citation preview

Page 1: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 158

Tera-Tom on Teradata Basics

by Morgan Jones and Tom Coffing ISBN0970498012

Coffing Data Warehousing copy 2001 (125 pages)

Both management and IT will understand this masterpiece written by the worlds top authorities on Teradataand data warehousing describing how Teradata is built to achieve data warehouse utopia

Table of Contents

Introduction

TeradatamdashThe Shining Star

Teradata Databases Users and Space

Data Protection

Loading the Data

ConclusionmdashA Final Thought on Teradata

Tera-Tom on Teradata BasicsmdashTeradata Explained Through Unimaginable Simplicity

Tera-Tom on Teradata BasicsmdashTeradata Explained

Through Unimaginable Simplicity

Tom Coffing

Morgan Jones

First Edition 2001

Web Page httpwwwTera-Tomcom E-Mail addressesTom Tcoffingaolcom

Teradatareg NCRtrade and BYNETreg are registered trademarks of NCR Corporation Dayton Ohio USAIBMreg and DB2reg are registered trademarks of IBM Corporation ORACLEreg is a registered trademark of Oracle SYBASEreg is a registered trademark of SYBASE ANSIreg is a registered trademark of the American

National Standards Institute In addition to these products names all brands and product names in this documenare registered names or trademarks of their respective holders

Coffing Data Warehousing shall have neither liability nor responsibility to any person or entity with respect toany loss or damages arising from the information contained in this book or from the use of programs or programsegments that are included The manual is not a publication of NCR Corporation nor was it produced inconjunction with NCR Corporation

Copyright copy 2001 Coffing Publishing

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 258

All rights reserved No part of this book shall be reproduced stored in a retrieval system or transmitted by anymeans electronic mechanical photocopying recording or otherwise without written permission from the publisher No patent liability is assumed with respect to the use of information contained herein Although every precaution has been taken in the preparation of this book the publisher and author assume no responsibility for errors or omissions neither is any liability assumed for damages resulting from the use of information containedherein For information address

Coffing Publishing

7810 Kiester Rd Middletown OH 45042

0-9704980-1-2

All terms mentioned in this book that are known to be trademarks or service have been stated Coffing

Publishing cannot attest to the accuracy of this information Use of a term in this book should not be regardedas affecting the validity of any trademark or service mark

Acknowledgements and Special Thanks

This book is dedicated to Americans and friends of liberty and freedom

We also want to thank our wives Leona Coffing and Janie Jones

Thanks to a great editor and friend ndash Cheryl N Buford

Introduction

Overview

A full 40 of Fortunes US Most Admired companies use Teradata What do they know that your companyneeds to know Ive been in the computer business for more than 27 years Ive witnessed so much since theearly days of punch cards assembler languages and COBOL programming With that in mind the mostmagnificent ingenious technology that Ive ever seen is a database from the NCR Corporation calledTeradata

The wave of the future is coming and there is no fighting it

Anne Morrow Lindbergh

Teradata is absolutely the wave of the future in data warehousing I introduced this technology to a great friend

Morgan Jones He immediately recognized that Teradata is the gold standard for all data warehousing and as aresult weve partnered to write this book So sit back relax and enjoy With our guidance you will soonrealize why Teradata is the greatest technology on the planet

The Ten Rules of Data Warehousing

What weapon was deemed so powerful that experts claimed it would end all wars Believe it or not it was thecrossbow Throughout history people have improved technology and advanced society through foresight andingenuity Just when we think something is impossible it becomes a reality Who would have dreamed we could

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 358

send a person to the moon or that someone could run a mile in under four minutes Ingenuity and the desire toimprove are attributes of the human race and both are found in numerous avenues from sports to business

Expect the unexpected or you wont find it

Roger von Oech

When Frank Lloyd Wright began to design the Imperial Hotel in Tokyo he discovered the unexpected just

eight feet below the surface of the ground lay a sixty-foot bed of soft mud Since Japan is a land of frequentshakes and tremors Wright was faced with what appeared to be an insurmountable obstacle This gave him anidea Why not float the Imperial Hotel building on the bed of mud and let it absorb the shock of any quakeCritics and cynics alike laughed at such an impossible idea Frank Lloyd Wright built the hotel anyway Shortlyafter the grand opening of the hotel Japan suffered its worst earthquake in fifty-two years All around Tokyo buildings were destroyed but the Imperial Hotel stood firm

For a long time the mainframe and OLTP industry laughed at those who recommended the data warehousedesign principles set forth in this book But those companies that build one based upon these rules will join theranks of the elite Consider this ten of the Top 13 global communications companies use Teradata nine of thetop 16 global retailers use Teradata and eight of the top 20 global banks use Teradata

The ability to continually improve is one of Teradatas greatest strengths The database was designed in 1976and has continually improved ever since Teradata has averaged one data warehouse installation per week for the past decade Through continual improvement based on customer feedback from many of the largest datawarehouse sites Teradata has been able to identify itself as the data warehouse of choice for award winningdata warehouses

This book begins with the 10 cardinal rules to follow for data warehouse success It illustrates how Teradatahelps customers follow these rules Then it explains the brilliance of how Teradata works By the end thereader will have a real grasp of essential Teradata concepts

Rule 1 - Start Building Towards A Central Data Warehouse

Moments after midnight on July 30 1945 the Navy cruiser USS Indianapolis suffered a fatal torpedo hitfrom a Japanese submarine It had been traveling unescorted through the Philippine Sea Within 12 minutes of the deadly hit the ship sank Over 300 men were killed and nearly 900 were stranded in shark-infested seasTragically those who survived until daylight faced four tortuous days in the water and battled continuous sharkattacks before being stumbled upon by a passing ship In the end only 316 souls survived With a crew of 1199 people this was one of the worst military disasters of World War II for the United States

Most people assume that war is cruel but the heart-wrenching story above becomes even more tragic when thefollowing facts are revealed First the ships captain did not have all of the facts and second the Navy did not

provide the captain with a single version of the truth The Captains request for a destroyer escort was deniedeven though the regional Naval command knew another ship had been attacked just two days earlier plusmultiple enemy sightings had occurred within the previous five days Not only were these crucially relevantfacts withheld but also the captain of the Indianapolis was told that his passage route was clear and therewould be no need for a destroyer escort

To withhold news is to play God

John Hess

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 458

Had everyone involved with the USS Indianapolis adhered to a single version of the truth with detail data to back them up this disaster may have never occurred Likewise if your company doesnt maintain detail data ina Centralized Data Warehouse you will never know which version of the truth to believe Each division of a business will have its own view of the truth Summarized data such as a data mart does have its place inknowledge management but it should always be built from the detail data within the central data warehouse

Most companies dont have a Central Data Warehouse Why Because they dont have proper leadership or direction Company leaders often let different branches of the company create data marts that are effective

short-term solutions These solutions are based on departmental leadership that is most interested in short-termsolutions Such leaders dont plan on being with a particular department forever so they are only interested inkeeping things simple controlled and beneficial to them

Were all in this alone

Lily Tomlin

For example imagine a company that made cars on an assembly line Instead of using a giant plant with thelatest and greatest technology the company builds cars in 300 small garages Each garage is owned by adifferent department and has different needs In addition every user has his access restricted to his or her

garage With this structure leaders feel safe but building cars logistically is a nightmare In fact just movingcars from one garage to the next would be a joke This scenario may seem simple-minded but that is how mostdata warehouses are built Each part of some data warehouses operates alone

Now imagine a giant car assembly plant where the assembly line was managed by the idea of There is no lsquoIrsquoin Team This plant would continually improve processes finding better ways to work together Everyone hasan idea what the others are doing and new ideas are welcome Management is able to run the entire plant withone team of dedicated professionals and decisions are made cooperatively concisely and clearly

This style of management is the idea behind a central data warehouse From the top layer of management downthrough the entire company they are one solid team A data warehouse experienced team saves valuable money

and resources plus users can manage the entire data warehouse Executives may ask any question targeted toany part of the business Decisions are made with long-term vision and every employee is confident that whenthey need answers - the data warehouse will provide them

If I have seen further it is by standing on the shoulders of giants

Isaac Newton

When asked how he had discovered the Law of Gravity Isaac Newton did not grab all of the glory for himselfHe claimed that his work stood on the foundation of those early scientists who had gone before him Likewise acentral data warehouse allows users to stand on the shoulders of another giant This giant built right allows

major corporations to make decisions and act on those decisions quickly

In 1993 I was asked to train one of the worlds largest retailers on its Teradata data warehouse I flew toBentonville Arkansas and an employee met me at the airport then escorted me to the classroom As we walkeddown the hallways most employees seemed to be at a pace I had never seen before They were practicallyrunning I asked Whats up Why is everyone hurrying The employee replied Its work time I wasshocked In all of places I had previously worked we strolled This place had a leadership that Ive never encounteredhellipanywhere H Ross Perot described this kind of team when he said When building a team I firstlook for people who love to win if I cant find any of those then I look for people who hate to lose This was aconcise team of employees so motivated and so empowered that they thought they could take over the world

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 558

As I grew to know the team I asked them how long it took top management to make a decision And how longdid it take to implement that decision at thousands of stores nationwide They simply said About two hours Iwas amazed Today this team continues to have one of the single greatest data warehouses ever built They useit extensively and it grows stronger every day

While visiting with this team management decided at one point that stores across the country should placeHalloween displays and candy near the cash registers In less than two hours stores moved their Halloweencandy from the normal candy aisles to end-caps near the cash register Every store participated but one

When asked why he didnt participate the store manager said he had simply run out of time to create thedisplays plus move the Halloween candy from his normal candy aisle to the end-caps Management was tickedTelling the manager they would get back to him they then asked the DBA to query the data warehouse to seehow much this snafu had cost the company The DBA came back and reported that the store actually soldalmost the same amount of Halloween candy as forecasted Management was surprised and honestly a littledisappointed with the answer But then the DBA added somewhat sheepishly I found something else tooGo ahead replied members of the management team He said I found out they actually sold about 40 morenormal candy then we forecasted for this holiday Management got on the phone immediately and told theother thousand stores Move those goblins and Halloween candy back to the normal candy aisles

What that DBA did was to use his instinct and the data warehouse to find out exactly what was going on withthe business at that time He was armed with a system that had cross-functional analysis A central datawarehouse gives good management great confidence because they see the whole picture When users can ask any question at any time and on any data their knowledge is unlimited

Most Teradata Central Data Warehouse sites will tell you most of their Return On Investment (ROI) came fromareas they never suspected Thomas Jefferson once said We dont know one millionth of a percent aboutanything When we explained Teradata to Jefferson he did not build another Monticello but he did retract hisstatement Companies with a centralized data warehouse know about a million percent more than companiesthat have invested in stovepipe applications and 300 different data marts

Actually any company planning on competing in this millennium must think long-term and begin building acentralized data warehouse If not that company will be on the short end of the stick when competing with acompany that chose to build one That thought should sound scarier than a goblin near the cash registers onHalloween

If you think about it every major decision in business makes someone happy If you are armed with factssupported by a central data warehouse and you do your homework your business decisions will make your shareholders happy However if you are making decisions with a data mart strategy those decisions are morelikely to make your competitors happy

There are many companies that are fearful of such an undertaking They want a central data warehouse but

wonder What if it fails Which database should we choose What type of hardware do we need Should wedo an RFP Decisions decisions It would literally take me about 30 seconds to make a decision on TeradataThere would be no RFP We used to wade in swimming pools of data today we are swamped in oceans of dataTeradata is built for this type of environment This book explains the fundamentals of Teradata Anyone withany experience or knowledge about data warehouse environments will clearly see why Teradata is the bestsolution

Rule 2 - Build for the User

A learned person is not one who gives the right answers it is the one who asks the right questions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 658

Claude Levi-Strauss

The user is the heart of the data warehouse and they get better with each day of experience The user makesdecisions that affect the companys bottom line Thats why the data warehouse is built around the business userBuilding a data warehouse is simple find out what data the business users need and what type of queries theywant to ask but are not able to ask today Then find out if the data is available and if the queries can be attainedWith those answers you will exceed usersrsquo expectations

An experienced data warehouse user is usually shocked when he or she first uses Teradata Its sheer power andflexibility enables users to ask questions they have never been able to ask before On a recent consultant trip of mine a young DBA got antsy when a particular query took more than a minute or so with Teradata So I askedWell how long did that same query take with your OLTP-based data warehouse He retorted We couldnteven run this query on the old system I said So whats wrong two minutes He added You know some ofour business users are so used to how long our queries used to run that they will be sitting staring at the screenwithout realizing that Teradata has already brought back the answer With Teradata users can expand their thinking by using intuition and keen business sense without technology barriers

The building of an enterprise data warehouse begins with top management but then cascades down to arelationship between the IT department and the business user community

The IT department must realize they have a supporting role That role is to please the business user by makingdata available so the business user can easily ask questions and get answers Its also the IT departments role to build a system that allows users to ask questions on their own without IT intervention Forget about building asystem where users ask IT to run the queries for them When users need information the IT department shouldeventually be able to say Ask the question yourselfhellipit is all available to you

The business users are actually the stars however the entire business community must take responsibility for the warehouses success These users must continually educate themselves and other users on the capabilities of the data warehouse new tools and new techniques that will enhance its potential Those same users must helpIT help them If both understand their respective roles and work together to help the company then the data

warehouse will be a huge success

Rule 3 - Let the IT Department Lead the Way to User Utopia

Few sports challenges are as grueling or demanding as the Tour de France But victory at this event eludedLance Armstrong a powerful young cyclist from Austin Texas Lance excelled in individual competition evenwinning the World Championships But despite his hard work Lance could not overcome the Europeansrsquo strongand proud tradition at the Tour de France A few years ago Lance was thrown into the battle of his life notagainst others but against himself He discovered that he had cancer and was given virtually no chance of surviving Suddenly he found out how little cycling really meant in life With all his might Lance battled hisway back to health beating the odds Now he found out how very much cycling could mean in life His bicycle

became a tool to reclaim the future He found a spot as a team member for the US Postal Service team With anew perspective and a new depth of character Lance led that team to victory in the next Tour de France And herepeated this victory again for the next two years

To win the premier event in the cycling world Lance Armstrong had to totally rethink his role In the sameway the key members of any company seeking success with its data warehouse must rethink their roles The ITdepartment plays a key role in a data warehouse What do users know about technical issues Not enough to build a data warehouse So technical issues are the responsibility of the IT department The danger with thistrain of thought is that while the IT department has years of experience with handling company transactionsthrough production databases and applications most are new at data warehousing A data-warehousing

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 758

environment can be extremely different than anything an IT department has ever built or used before Thereforeits a bad idea to build a data warehouse without the help of experienced people

An OLTP environment gets more and more predictable each month It is designed to be tweaked and tuned inorder to maximize a companys environment On the other hand a data warehouse is an unpredictableenvironment where the only way to gain control is to actually give up control In data warehousing the user must be allowed the freedom to ask the questions and they will blossom in an environment where flexibility isaccepted and welcomed

The only sure weapon against bad ideas is better ideas

A Whitney Griswold

If the IT department decides to build hundreds of data marts that will please each and every department thenthey are missing the boat Data warehouse experience is a hard teacher because it gives the test first and thelesson afterwards Abraham Lincoln once said A house divided cannot stand With that in mind build thedata warehouse so it will stand strong for a long time

Whats the formula First and foremost start by building your data warehouse around detail data Bring

transaction data along with key details from the OLTP systems into the data warehouse Then as knownqueries are identified build data marts to enhance their performance and also insist that data marts are createdand maintained directly from the detail data Doing so will build a foundation that will stand

Next the IT department needs to keep an open mind about creating an environment called User Utopia Haveyou ever been there In User Utopia the user confidently asks queries without fear of being charged by theminute The user has meta-data so he or she becomes intimate with the data then makes informed decisionsThe user should also be able to ask monster queries with the full backing of IT Recently on one such query theIT department wanted to pull the plug But the DBA held out granting the user more time When the queryfinished running the information it brought back from the detail data saved the company millions of dollarsOverall a user will get the majority of his or her answers back quickly from data marts but he or she also needs

the capability of going back to the detail data for more information This is User Utopia

Here is the message for IT Dont follow the idea that if you build it they will come Instead become a leader

hellip go to the users and build it together

Rule 4 - Build the Foundation Around Detail Data

Business is always trying to predict the unpredictable The US Air Force Reserves 53rd Weather Reconnaissance Squadron is a special force that flies their planes directly into tropical storms and hurricanesUsing a WC-130 Hercules aircraft they fly into storms at low altitudes between 1000 and 10000 feet takingweather readings that are relayed to the National Hurricane Center in Florida They measure wind speeds

measure the pressure and structure of the storm and most importantly locate the eye of the storm The datacollected by these Hurricane Hunters is used to determine when and where a storm might hit the coast andhow strong it will be at that time Teradata has no fear of detail data its virtual processors will fly right intothick of your data warehouse to bring back valuable information for decision support You see Teradata enablesyou to understand the storms in your business today while helping you predict when and where the next stormwill hit tomorrow

I estimate that 80 of todays data warehouses are built on summary (summarized) data Therefore 80 of all data warehouses will never come close to realizing their full potential Your data warehouse does not have to be one of them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 858

A bird does not sing because it has the answers it sings because it has a song

A data warehouse built on detail data does not sing because it has a song it sings because it has the answersWhen you capture detail data answers to an infinite amount of questions are available But if this is truly thecase then why doesnt everybody build around detail data Well there are two reasons One is price Like a bird many companies decide to go cheap cheap But watch out The real expense is not the cost of the datawarehouse it is the money that you will not make without one The second reason is power Many companiesdont have the wingspan to fly through the detail so they sore with the summary In addition some companies

dont want to pay for the disk space it actually takes to keep detail data but believe me that cost is a small priceto pay for success

Once you miss the first buttonhole it becomes difficult to button your shirt

Many companies use the same database for their data warehouse as they have done for their OLTP system Thisis a critical mistake In essence they have missed the first buttonhole and most likely will lose their shirt ontheir data warehouse adventure

At this point companies no longer have a choice of using detail data They must summarize for performancereasons As one marine told his boot camp soldiers jokingly The beatings will continue until the moral

improves Similarly a database designed for OLTP takes a continual beating when it tries to query largeamounts of detail data

Companies building true data warehouses dont compromise on price and will have a data warehouse that is built for decision support not one that specializes in OLTP With this decision you have buttoned the first buttonhole and are well on your way to reaching the top

Detail data is the foundation that data warehouses are built upon Users can ask any question anytime andconduct data mining OLAP ROLAP SQL and SPL functions build data marts directly from the detail dataand can easily maintain and grow the environment on a daily basis Now thats a tune well worth singing Makea note of it

Rule 5 - Build Data Marts from the Detail

You cannot teach a man anything you can only help him find it within himself

Galileo

Galileo was a smart man How did he know so much about life and data marts When we explained to Galileodata marts he said You cannot build a data mart directly from the OLTP systems you can only build a datamart directly from the detail within He was right

Many companies build data mart after data mart directly from the OLTP systems and their universe begins torevolve around continual maintenance Then as things get worse as Galileo predicted their universe begins torevolve around the son The son of a gun sent in to replace them

Why does this happen At first things work out great but soon there are more and more requests for additionalinformation As a result more and more data marts are created and soon the system looks like a giant spider web Different data marts start to yield different results on like data and the actual maintenance of thiscomplicated spider web takes up most of ITs time Meanwhile short-term dreams turn into long-termnightmares like this one A man and his wife had had a big argument just before he went on a business tripFeeling rather contrite about his harsh words he arranged to send his wife some flowers and asked the florist to

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 958

write on the card Im sorry I love you The beautiful bouquet arrived at the door But then his wife read thewords the florist had actually written in haste Im sorry I love you

The top reasons to build data marts directly from detail data are

bull Users can get answers from the data mart but must validate their findings or check out additionalinformation from the detail that built it

bull There is only one consistent version of the truthbull Maintenance is easy

If a user comes up with a data mart answer that does not make sense then he or she has the ability to drill downinto the detail and investigate Sometimes summary data can spark interest and finding out the why can resultin big bucks

If users dont trust the data they wont use the systemWhen a data warehouse is built on a foundation of detaildata and then data marts are erected from that foundation you have a winning combination The results willalways be consistent and trustworthy However you should only build data marts when there is a credible business case and you should be ready to drop them when they are no longer needed The life span of a datamart is relatively short to that of its mother and father (better known as the detail data) If you build the data

mart from the detail it makes them easy to manage easy to drop and easy to change

Rule 6 - Make Scalability Your Best Friend

Plan your life for a million tomorrows and live your life as if tomorrow may be your last

Morgan Jones

The roar of class-6 rapids on a river in Suriname can be almost deafening against the dense walls of the jungleEspecially when you are 9 years old Our mission was to lower our canoe down the waterfall with ropes TheTrio Amer-Indian who anchored our 40-foot dugout canoe let go of the anchor rope too quickly Without

warning the heavy boat began a freefall through the rocky water with my father hanging onto the side for dear life He disappeared under the rocky waters and I knew for sure we had lost him My heart pounded in againstmy chest As I rallied myself to grasp this loss as only a nine year old can the Indians abruptly began cheeringwildly above the roar of the river My dad had resurfaced a hundred yards downstream battered and bruised but he was alive In just one short minute I determined that I would love my family every day as if there wereno tomorrow

As I made my family my best friend a data warehouse must make scalability its best friend A data warehousethat does not scale will have no tomorrow It is only a matter of time until the warehouse disappears in rockywaters only to never come up for air Dont let go of the anchor rope

The data-warehousing environment will throw obstacles in your way every single day A data warehouse must be planned to meet todays needs But it must also be capable of meeting tomorrows challenges The futurecannot be predicted so plan for unlimited growth or linear scalability - - both vertical and horizontal There areso many data warehouses that start out with sizzling performance but as they grow they eventually andinevitably hit the scalability wall However before they hit the wall there is a pattern of diminishing performance

A data warehouse designed without scalability in mind is doomed before it is begun It can never reach its potential Take the scalability question out of the equation by investing in a database that allows you to startsmall but grows linearly

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1058

In todays fast paced world Gigabytes soon become Terabytes It may not sound like much but it weighs a tonon the shoulders of giants Listen to these measurements and pick your data warehouses life span For exampleif you lived for a million seconds (Megabyte) then you would live for 115 days In comparison if you lived fora billion seconds (Gigabyte) then you would live for 315 years Plus if you lived for a trillion seconds(Terabyte) then you would live for 31688 years

How nice it would be on your 31688th

birthday that people would say You sure look good for your age

Data warehouses hit the wall of scalability because they cannot grow with the same degree that the amount of data being gathered grows Teradata allows for unlimited linear scalability Linear Scalability is a building block approach to data warehousing that ensures that as building blocks are added the system continues at the

same performance level

This is why the largest data warehouses in the world use Teradata I was lucky to be in the right place at theright time and taught beginning stages at what are considered the two largest data warehouse sites in the worldSouth Western Bell (SBC) and Wal-Mart

Wal-Marts data warehouse started with less than 30 gigabytes and SBC started with less than 200 gigabytesand 100 users Both warehouses

bull Started small and simplebull Used Teradata from the beginningbull Have built the largest Enterprise Data Warehouse in their respective industriesbull Continue to realize additional Return On Investment (ROI) on an annual basisbull Have grown to more than 10 Terabytes of data and are still growingbull Have thousands of users (some estimates are shocking)bull Have educated and experienced data warehouse staffsbull Have educated and experienced data warehouse usersbull Experience continual growth without boundariesbull Have experienced linear performance by Teradata in every single upgrade (from gigabytes to terabytes

and from terabytes to tens of terabytes)bull Both companies are impressed with Teradatas power and performancebull And both SBC and Wal-Mart are committed to the excellence of Teradata

A data warehouse is built in small building blocks Linear Scalability is described in three ways

First building blocks are added until the performance requirements of your environment are met (GuaranteedSuccess)

Second every time the data doubles building blocks are doubled and the system maintains its performancelevel (Guaranteed Success) and

Third any time the environment changes building blocks are added until performance requirements are met(Guaranteed Success)

Scalability is not just about growing the data volume It also means growing or increasing the number of usersMany systems work flawlessly until as few as 5 users are added then they slow down to a crawl Companiesneed a system where growth and performance are easily calculated and implemented That means where thenumber of users size and complexity of queries volume of data and number of applications being used can becalculated and compared to the current systems actual size If more power speed or size is needed then thecompany can simply add building blocks to the system until the requirements are met

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1158

Rule 7 - Model the Data Correctly

You will find only what you bring in

Yoda Jedi Master in Star Wars

We model a database for the same reasons that Boeing builds an aircraft model to test flight characteristics in awind tunnel Its simpler and cheaper to model than to reconstruct the plane by iterations until you get it right

A proper data model should be designed to reflect the business components and possible relationships

Here are three rules for modeling data in a data warehouse

1 Model the data quickly2 Normalize the detail data3 Use a dimensional model for data marts

The 3rd Normal Form believes each column in a table should be directly related to the primary key the wholekey and nothing but the key Data is placed into tables where it makes the most sense and has no repeatinggroups derived data or optional columns This allows users to ask any question at any time on all data within

the enterprise Users do not have to strive for 3rd Normal Form but just normalize the data the best they canThere will be fewer columns in a table but a lot more tables overall This model is easier to maintain incrediblyflexible and allows a user to ask any question on any data at any time

A Star-Schema model is comprised of a fact table and a number of dimension tables The fact table is a tablewith a multi-part key Each element of the key is itself a foreign key to a single dimension table Theremaining fields in the fact table are known as facts and are numeric continuously valued and additive Factscan be thought of as measurements taken at the intersection of all of the dimensions Dimension attributes aremostly textual and are almost always the source of constraints and report breaks This model enhances performance on known queries or in other words queries users run repeatedly day after day

Most database modelers prefer to create a logical model in 3rd Normal Form but most database engines areovercome by physical limitations so they must compromise the model The four most difficult functions for adatabase to handle are

bull Join tablesbull Aggregate databull Sort databull Scan large volumes of data

In order to get around these system limitations vendors will suggest a model to avoid joins use summarizeddata to avoid aggregation store data in sorted order to avoid sorts and overuse indexes to avoid large scans

With these limitations vendors are also going to avoid being able to compete That is like placing a ball andchain around the runners leg and saying I wish you all the best in the marathon Come on Whose side arethese vendors really on

Teradata is the only database engine I have seen that has the power and maturity to use a 3rd Normal Form physical model on databases exceeding a terabyte in size Because of the physical limitations other databaseshave had to use a Star-Schema model to enhance performance but have given up on the ability to perform ad-hoc queries and data mining

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1258

A normalized model is one that should be used for the central data warehouse It allows users to ask anyquestion at any time on information from any place within the enterprise This is the central philosophy of adata warehouse It leads to the power of ad-hoc queries and data mining whereby advanced tools discover relationships that are not easily detected but do exist naturally in the business environment

A Star-Schema model enhances performance on known queries because we build our assumptions into themodel While these assumptions may be correct for the first application they may not be correct for othersFlexibility is a big issue but data marts can be dropped and added with relative ease if each is built directly

from the detail data

Remember build the data warehouse around detail data using a normalized model Then as query patternsemerge and performance for well-known queries becomes a priority Star Schema data marts can be created by extracting summarized or departmental data from the centralized data warehouse The user will then haveaccess to both the data marts for repetitive queries and the central warehouse for other queries

Because data marts can be an administrative nightmare Teradata enables Star-Schema access withoutrequiring physical data marts By setting up a join index as the intersection of your Star-Schema model youcan create a Star-Schema structure directly from your 3 rd Normal Form data model Best of all once it iscreated the data is automatically maintained as the underlying tables are updated

Keep in mind 80 of data warehouse queries are repetitive but 80 of the Return On Investment (ROI) isactually provided by the other 20 of the queries that go against detailed data in an iterative environment Byusing a normalized model for your central data warehouse and a Star-Schema model on data marts you canenhance the possibility of realizing an 80 Return on Investment and still enhance the performance on 80 of your queries

Rule 8 - Dont Let a Technical Issue Make Your Data Warehouse a Failure Statistic

Experience is a hard teacher because she gives the test first the lesson afterwards

Scottish Proverb

Did you know that 34th

of the people in the world hate fractions and that 40 of the time a data warehouse failsis because of a technical issue There are many traps and pitfalls in every data warehouse venture One winter day a hunter met a bear in the forest The bear said Im hungry I want a full stomach The man repliedWell Im cold I would like a fur coat Lets compromise said the bear and he quickly gobbled up thehunter They both got what they asked for The bear went away with a full belly and the man left wrapped in afur coat With that in mind good judgment comes from experience experience comes from bad judgment Youhave shown good judgment by reading this book so let our experience keep your company from having a baddata warehouse experience

Author Daniel Borsten wrote in The Discoverist The greatest obstacle to discovering the shape of the earththe continents and the oceans was not ignorance but rather the illusion of knowledge There is a lot of illusion of knowledge being spread around in the data-warehousing environment Before you decide on anydata warehouse product ask yourself and the vendor these questions

bull As my data demands increase will the system be able to physically load the data Our experience showsthat many systems are not capable of handling very large volumes of data Do the math

bull As the data grows in volume can the system meet the performance requirements Do the mathbull As the number of users grows will the system be able to scale Do the math

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1358

bull As my environment changes will the system be flexible enough to allow changes quickly and easilyDo the math

bull Will the system need so many Database Administrators (DBAs) that my systems cost skyrockets Dothe math

bull If we suddenly merged with another company and needed to incorporate into their mainframe or LANenvironment would the system be able to connect and include them Do the math

bull Can I continue to meet my batch window timeframes Do the mathbull Could I become the hero of the company one day only to have some technical glitch blamed on me

because of my poor foresight and be thrown out of the company into a giant mud puddle Do the bath

Rule 9 - Take a Building Block Approach

Be not afraid of growing slowly be afraid only of standing still

Chinese Proverb

Ever since Vasco de Balboa discovered the Pacific coast of Panama in 1513 kings and businessmen alikedreamed of the impossible to cut a waterway across the mountainous isthmus creating a shortcut between theAtlantic and Pacific Oceans Those dreams turned into reality during the Industrial Revolution It took almost

forty years of trial and error before the worlds greatest engineering feat since the Pyramids was completed in1914 Ships move through the locks of the canal rising 85 above sea level before they descend to the oppositeside Since its grand opening in 1920 the Panama Canal has revolutionized trans-oceanic traffic joining Eastand West Its 50-mile stretch saves every vessel about 8000 extra nautical miles of travel around the bottom tipof South America Several modifications have been engineered through the years to accommodate theincreasing size of ships

Data warehouses like the Panama Canal must be built over time and changed over time to meet new demandsA data warehouse must grow with the environment but the environment is unpredictable All sailors know thatthey cant direct the wind but that they can adjust their sails In comparison all data warehouse users know theycant direct the environment but they can adjust their warehouse Sometimes the data warehouse will grow

quickly and sometimes it will grow slowly but it should always be growing

So take a building block approach to data warehousing Teradata allows you to expand without boundaries -one building block at a time Plus adding on building blocks is easy

There are two aspects to a building block approach First you need to add applications to your data warehousein three to six month intervals Once the first application works then you are ready for more projects As you become more experienced with this approach you can add multiple projects in parallel by involving multipleorganizations

The second aspect of the building block approach is in the actual data warehouse architecture It doesnt matter

if yours is the smallest data warehouse in the world the largest or falls somewhere in between power andscalability always fuel success

Not long ago a customer flew out to San Diego for a Teradata demonstration and benchmark The benchmark ran late into the evening but the numbers were more than 50 better than the competition The customer wasextremely impressed but before buying he demanded to see the system scalability that everyone had beentalking about Although it was already late a Teradata employee was called in the middle of the night arrivedwithin 10 minutes (in pajamas) hooked up the building blocks and ran a utility called config She ran anothercalled reconfig and in less than two hours the system size doubled

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1458

As the environment changes in terms of users data complexity capacity batch windows time changes eventsor opportunities users should be able to continue building applications and architecture The more a Teradatasystem grows the more Teradata outshines the competition

Rule 10 - Buy a Teradata Data Warehouse

Men occasionally stumble over the truth but most of them pick themselves up and hurry off as if nothing hadhappened

Winston Churchill

Winston Churchill led Britain through World War II during what he called that countrys finest hour Whenusers see consistent data the system too is in its finest hour Teradata gives users the ability to ask questionsthey could never ask before Users trust Teradata because of its industry performance and reputation and because it never gives in Constant use gives users optimal business experience and no matter what a user asks the system responds with a hearty Yes Sir

When we explained Teradata to Churchill he said

A data WARe-house that consists of 250 Data marts is like poison and if I were the MIS departmentresponsible for maintaining them Id take it

Teradata guarantees an Enterprise Data Warehouse with no scalability issues Data loads like lightning andsystem administration is a breeze You can pick the performance level that meets your requirements for todayand forever The database can be normalized around detail data and because of Teradatas power users have theflexibility to ask any question at any time on any data

All other databases are suspect in data loading capabilities scalability reference sites decades of datawarehouse experience flexibility system administration difficulties and inability to handle the complex queriesof todays users These users are good

TeradatamdashThe Shining Star

Overview

Teradata has always been at the top of the data warehouse game even if the experts werent bright enough toknow it The incredible vision that the original designers had was tremendous It was so far to the left of geniusthat most thought the idea was impossible

Only he who attempts the ridiculous may achieve the impossible

Don Quixote

The Teradata database was originally designed in 1976 and many of the fundamental concepts still remaintoday Nearly 25 years later Teradata is still considered ahead of its time

In 1976 IBM mainframes dominated the computer business Everyone who was anyone had an IBMMainframe However the original founders of Teradata noticed that it took about 4 frac12 years for IBM to producea new mainframe They also noticed a little company called Intel Intel created a new PC chip every 2frac12 yearsWith mainframes moving forward every 4 frac12 years and PC chip every 2frac12 years Teradata recognized their

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 2: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 258

All rights reserved No part of this book shall be reproduced stored in a retrieval system or transmitted by anymeans electronic mechanical photocopying recording or otherwise without written permission from the publisher No patent liability is assumed with respect to the use of information contained herein Although every precaution has been taken in the preparation of this book the publisher and author assume no responsibility for errors or omissions neither is any liability assumed for damages resulting from the use of information containedherein For information address

Coffing Publishing

7810 Kiester Rd Middletown OH 45042

0-9704980-1-2

All terms mentioned in this book that are known to be trademarks or service have been stated Coffing

Publishing cannot attest to the accuracy of this information Use of a term in this book should not be regardedas affecting the validity of any trademark or service mark

Acknowledgements and Special Thanks

This book is dedicated to Americans and friends of liberty and freedom

We also want to thank our wives Leona Coffing and Janie Jones

Thanks to a great editor and friend ndash Cheryl N Buford

Introduction

Overview

A full 40 of Fortunes US Most Admired companies use Teradata What do they know that your companyneeds to know Ive been in the computer business for more than 27 years Ive witnessed so much since theearly days of punch cards assembler languages and COBOL programming With that in mind the mostmagnificent ingenious technology that Ive ever seen is a database from the NCR Corporation calledTeradata

The wave of the future is coming and there is no fighting it

Anne Morrow Lindbergh

Teradata is absolutely the wave of the future in data warehousing I introduced this technology to a great friend

Morgan Jones He immediately recognized that Teradata is the gold standard for all data warehousing and as aresult weve partnered to write this book So sit back relax and enjoy With our guidance you will soonrealize why Teradata is the greatest technology on the planet

The Ten Rules of Data Warehousing

What weapon was deemed so powerful that experts claimed it would end all wars Believe it or not it was thecrossbow Throughout history people have improved technology and advanced society through foresight andingenuity Just when we think something is impossible it becomes a reality Who would have dreamed we could

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 358

send a person to the moon or that someone could run a mile in under four minutes Ingenuity and the desire toimprove are attributes of the human race and both are found in numerous avenues from sports to business

Expect the unexpected or you wont find it

Roger von Oech

When Frank Lloyd Wright began to design the Imperial Hotel in Tokyo he discovered the unexpected just

eight feet below the surface of the ground lay a sixty-foot bed of soft mud Since Japan is a land of frequentshakes and tremors Wright was faced with what appeared to be an insurmountable obstacle This gave him anidea Why not float the Imperial Hotel building on the bed of mud and let it absorb the shock of any quakeCritics and cynics alike laughed at such an impossible idea Frank Lloyd Wright built the hotel anyway Shortlyafter the grand opening of the hotel Japan suffered its worst earthquake in fifty-two years All around Tokyo buildings were destroyed but the Imperial Hotel stood firm

For a long time the mainframe and OLTP industry laughed at those who recommended the data warehousedesign principles set forth in this book But those companies that build one based upon these rules will join theranks of the elite Consider this ten of the Top 13 global communications companies use Teradata nine of thetop 16 global retailers use Teradata and eight of the top 20 global banks use Teradata

The ability to continually improve is one of Teradatas greatest strengths The database was designed in 1976and has continually improved ever since Teradata has averaged one data warehouse installation per week for the past decade Through continual improvement based on customer feedback from many of the largest datawarehouse sites Teradata has been able to identify itself as the data warehouse of choice for award winningdata warehouses

This book begins with the 10 cardinal rules to follow for data warehouse success It illustrates how Teradatahelps customers follow these rules Then it explains the brilliance of how Teradata works By the end thereader will have a real grasp of essential Teradata concepts

Rule 1 - Start Building Towards A Central Data Warehouse

Moments after midnight on July 30 1945 the Navy cruiser USS Indianapolis suffered a fatal torpedo hitfrom a Japanese submarine It had been traveling unescorted through the Philippine Sea Within 12 minutes of the deadly hit the ship sank Over 300 men were killed and nearly 900 were stranded in shark-infested seasTragically those who survived until daylight faced four tortuous days in the water and battled continuous sharkattacks before being stumbled upon by a passing ship In the end only 316 souls survived With a crew of 1199 people this was one of the worst military disasters of World War II for the United States

Most people assume that war is cruel but the heart-wrenching story above becomes even more tragic when thefollowing facts are revealed First the ships captain did not have all of the facts and second the Navy did not

provide the captain with a single version of the truth The Captains request for a destroyer escort was deniedeven though the regional Naval command knew another ship had been attacked just two days earlier plusmultiple enemy sightings had occurred within the previous five days Not only were these crucially relevantfacts withheld but also the captain of the Indianapolis was told that his passage route was clear and therewould be no need for a destroyer escort

To withhold news is to play God

John Hess

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 458

Had everyone involved with the USS Indianapolis adhered to a single version of the truth with detail data to back them up this disaster may have never occurred Likewise if your company doesnt maintain detail data ina Centralized Data Warehouse you will never know which version of the truth to believe Each division of a business will have its own view of the truth Summarized data such as a data mart does have its place inknowledge management but it should always be built from the detail data within the central data warehouse

Most companies dont have a Central Data Warehouse Why Because they dont have proper leadership or direction Company leaders often let different branches of the company create data marts that are effective

short-term solutions These solutions are based on departmental leadership that is most interested in short-termsolutions Such leaders dont plan on being with a particular department forever so they are only interested inkeeping things simple controlled and beneficial to them

Were all in this alone

Lily Tomlin

For example imagine a company that made cars on an assembly line Instead of using a giant plant with thelatest and greatest technology the company builds cars in 300 small garages Each garage is owned by adifferent department and has different needs In addition every user has his access restricted to his or her

garage With this structure leaders feel safe but building cars logistically is a nightmare In fact just movingcars from one garage to the next would be a joke This scenario may seem simple-minded but that is how mostdata warehouses are built Each part of some data warehouses operates alone

Now imagine a giant car assembly plant where the assembly line was managed by the idea of There is no lsquoIrsquoin Team This plant would continually improve processes finding better ways to work together Everyone hasan idea what the others are doing and new ideas are welcome Management is able to run the entire plant withone team of dedicated professionals and decisions are made cooperatively concisely and clearly

This style of management is the idea behind a central data warehouse From the top layer of management downthrough the entire company they are one solid team A data warehouse experienced team saves valuable money

and resources plus users can manage the entire data warehouse Executives may ask any question targeted toany part of the business Decisions are made with long-term vision and every employee is confident that whenthey need answers - the data warehouse will provide them

If I have seen further it is by standing on the shoulders of giants

Isaac Newton

When asked how he had discovered the Law of Gravity Isaac Newton did not grab all of the glory for himselfHe claimed that his work stood on the foundation of those early scientists who had gone before him Likewise acentral data warehouse allows users to stand on the shoulders of another giant This giant built right allows

major corporations to make decisions and act on those decisions quickly

In 1993 I was asked to train one of the worlds largest retailers on its Teradata data warehouse I flew toBentonville Arkansas and an employee met me at the airport then escorted me to the classroom As we walkeddown the hallways most employees seemed to be at a pace I had never seen before They were practicallyrunning I asked Whats up Why is everyone hurrying The employee replied Its work time I wasshocked In all of places I had previously worked we strolled This place had a leadership that Ive never encounteredhellipanywhere H Ross Perot described this kind of team when he said When building a team I firstlook for people who love to win if I cant find any of those then I look for people who hate to lose This was aconcise team of employees so motivated and so empowered that they thought they could take over the world

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 558

As I grew to know the team I asked them how long it took top management to make a decision And how longdid it take to implement that decision at thousands of stores nationwide They simply said About two hours Iwas amazed Today this team continues to have one of the single greatest data warehouses ever built They useit extensively and it grows stronger every day

While visiting with this team management decided at one point that stores across the country should placeHalloween displays and candy near the cash registers In less than two hours stores moved their Halloweencandy from the normal candy aisles to end-caps near the cash register Every store participated but one

When asked why he didnt participate the store manager said he had simply run out of time to create thedisplays plus move the Halloween candy from his normal candy aisle to the end-caps Management was tickedTelling the manager they would get back to him they then asked the DBA to query the data warehouse to seehow much this snafu had cost the company The DBA came back and reported that the store actually soldalmost the same amount of Halloween candy as forecasted Management was surprised and honestly a littledisappointed with the answer But then the DBA added somewhat sheepishly I found something else tooGo ahead replied members of the management team He said I found out they actually sold about 40 morenormal candy then we forecasted for this holiday Management got on the phone immediately and told theother thousand stores Move those goblins and Halloween candy back to the normal candy aisles

What that DBA did was to use his instinct and the data warehouse to find out exactly what was going on withthe business at that time He was armed with a system that had cross-functional analysis A central datawarehouse gives good management great confidence because they see the whole picture When users can ask any question at any time and on any data their knowledge is unlimited

Most Teradata Central Data Warehouse sites will tell you most of their Return On Investment (ROI) came fromareas they never suspected Thomas Jefferson once said We dont know one millionth of a percent aboutanything When we explained Teradata to Jefferson he did not build another Monticello but he did retract hisstatement Companies with a centralized data warehouse know about a million percent more than companiesthat have invested in stovepipe applications and 300 different data marts

Actually any company planning on competing in this millennium must think long-term and begin building acentralized data warehouse If not that company will be on the short end of the stick when competing with acompany that chose to build one That thought should sound scarier than a goblin near the cash registers onHalloween

If you think about it every major decision in business makes someone happy If you are armed with factssupported by a central data warehouse and you do your homework your business decisions will make your shareholders happy However if you are making decisions with a data mart strategy those decisions are morelikely to make your competitors happy

There are many companies that are fearful of such an undertaking They want a central data warehouse but

wonder What if it fails Which database should we choose What type of hardware do we need Should wedo an RFP Decisions decisions It would literally take me about 30 seconds to make a decision on TeradataThere would be no RFP We used to wade in swimming pools of data today we are swamped in oceans of dataTeradata is built for this type of environment This book explains the fundamentals of Teradata Anyone withany experience or knowledge about data warehouse environments will clearly see why Teradata is the bestsolution

Rule 2 - Build for the User

A learned person is not one who gives the right answers it is the one who asks the right questions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 658

Claude Levi-Strauss

The user is the heart of the data warehouse and they get better with each day of experience The user makesdecisions that affect the companys bottom line Thats why the data warehouse is built around the business userBuilding a data warehouse is simple find out what data the business users need and what type of queries theywant to ask but are not able to ask today Then find out if the data is available and if the queries can be attainedWith those answers you will exceed usersrsquo expectations

An experienced data warehouse user is usually shocked when he or she first uses Teradata Its sheer power andflexibility enables users to ask questions they have never been able to ask before On a recent consultant trip of mine a young DBA got antsy when a particular query took more than a minute or so with Teradata So I askedWell how long did that same query take with your OLTP-based data warehouse He retorted We couldnteven run this query on the old system I said So whats wrong two minutes He added You know some ofour business users are so used to how long our queries used to run that they will be sitting staring at the screenwithout realizing that Teradata has already brought back the answer With Teradata users can expand their thinking by using intuition and keen business sense without technology barriers

The building of an enterprise data warehouse begins with top management but then cascades down to arelationship between the IT department and the business user community

The IT department must realize they have a supporting role That role is to please the business user by makingdata available so the business user can easily ask questions and get answers Its also the IT departments role to build a system that allows users to ask questions on their own without IT intervention Forget about building asystem where users ask IT to run the queries for them When users need information the IT department shouldeventually be able to say Ask the question yourselfhellipit is all available to you

The business users are actually the stars however the entire business community must take responsibility for the warehouses success These users must continually educate themselves and other users on the capabilities of the data warehouse new tools and new techniques that will enhance its potential Those same users must helpIT help them If both understand their respective roles and work together to help the company then the data

warehouse will be a huge success

Rule 3 - Let the IT Department Lead the Way to User Utopia

Few sports challenges are as grueling or demanding as the Tour de France But victory at this event eludedLance Armstrong a powerful young cyclist from Austin Texas Lance excelled in individual competition evenwinning the World Championships But despite his hard work Lance could not overcome the Europeansrsquo strongand proud tradition at the Tour de France A few years ago Lance was thrown into the battle of his life notagainst others but against himself He discovered that he had cancer and was given virtually no chance of surviving Suddenly he found out how little cycling really meant in life With all his might Lance battled hisway back to health beating the odds Now he found out how very much cycling could mean in life His bicycle

became a tool to reclaim the future He found a spot as a team member for the US Postal Service team With anew perspective and a new depth of character Lance led that team to victory in the next Tour de France And herepeated this victory again for the next two years

To win the premier event in the cycling world Lance Armstrong had to totally rethink his role In the sameway the key members of any company seeking success with its data warehouse must rethink their roles The ITdepartment plays a key role in a data warehouse What do users know about technical issues Not enough to build a data warehouse So technical issues are the responsibility of the IT department The danger with thistrain of thought is that while the IT department has years of experience with handling company transactionsthrough production databases and applications most are new at data warehousing A data-warehousing

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 758

environment can be extremely different than anything an IT department has ever built or used before Thereforeits a bad idea to build a data warehouse without the help of experienced people

An OLTP environment gets more and more predictable each month It is designed to be tweaked and tuned inorder to maximize a companys environment On the other hand a data warehouse is an unpredictableenvironment where the only way to gain control is to actually give up control In data warehousing the user must be allowed the freedom to ask the questions and they will blossom in an environment where flexibility isaccepted and welcomed

The only sure weapon against bad ideas is better ideas

A Whitney Griswold

If the IT department decides to build hundreds of data marts that will please each and every department thenthey are missing the boat Data warehouse experience is a hard teacher because it gives the test first and thelesson afterwards Abraham Lincoln once said A house divided cannot stand With that in mind build thedata warehouse so it will stand strong for a long time

Whats the formula First and foremost start by building your data warehouse around detail data Bring

transaction data along with key details from the OLTP systems into the data warehouse Then as knownqueries are identified build data marts to enhance their performance and also insist that data marts are createdand maintained directly from the detail data Doing so will build a foundation that will stand

Next the IT department needs to keep an open mind about creating an environment called User Utopia Haveyou ever been there In User Utopia the user confidently asks queries without fear of being charged by theminute The user has meta-data so he or she becomes intimate with the data then makes informed decisionsThe user should also be able to ask monster queries with the full backing of IT Recently on one such query theIT department wanted to pull the plug But the DBA held out granting the user more time When the queryfinished running the information it brought back from the detail data saved the company millions of dollarsOverall a user will get the majority of his or her answers back quickly from data marts but he or she also needs

the capability of going back to the detail data for more information This is User Utopia

Here is the message for IT Dont follow the idea that if you build it they will come Instead become a leader

hellip go to the users and build it together

Rule 4 - Build the Foundation Around Detail Data

Business is always trying to predict the unpredictable The US Air Force Reserves 53rd Weather Reconnaissance Squadron is a special force that flies their planes directly into tropical storms and hurricanesUsing a WC-130 Hercules aircraft they fly into storms at low altitudes between 1000 and 10000 feet takingweather readings that are relayed to the National Hurricane Center in Florida They measure wind speeds

measure the pressure and structure of the storm and most importantly locate the eye of the storm The datacollected by these Hurricane Hunters is used to determine when and where a storm might hit the coast andhow strong it will be at that time Teradata has no fear of detail data its virtual processors will fly right intothick of your data warehouse to bring back valuable information for decision support You see Teradata enablesyou to understand the storms in your business today while helping you predict when and where the next stormwill hit tomorrow

I estimate that 80 of todays data warehouses are built on summary (summarized) data Therefore 80 of all data warehouses will never come close to realizing their full potential Your data warehouse does not have to be one of them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 858

A bird does not sing because it has the answers it sings because it has a song

A data warehouse built on detail data does not sing because it has a song it sings because it has the answersWhen you capture detail data answers to an infinite amount of questions are available But if this is truly thecase then why doesnt everybody build around detail data Well there are two reasons One is price Like a bird many companies decide to go cheap cheap But watch out The real expense is not the cost of the datawarehouse it is the money that you will not make without one The second reason is power Many companiesdont have the wingspan to fly through the detail so they sore with the summary In addition some companies

dont want to pay for the disk space it actually takes to keep detail data but believe me that cost is a small priceto pay for success

Once you miss the first buttonhole it becomes difficult to button your shirt

Many companies use the same database for their data warehouse as they have done for their OLTP system Thisis a critical mistake In essence they have missed the first buttonhole and most likely will lose their shirt ontheir data warehouse adventure

At this point companies no longer have a choice of using detail data They must summarize for performancereasons As one marine told his boot camp soldiers jokingly The beatings will continue until the moral

improves Similarly a database designed for OLTP takes a continual beating when it tries to query largeamounts of detail data

Companies building true data warehouses dont compromise on price and will have a data warehouse that is built for decision support not one that specializes in OLTP With this decision you have buttoned the first buttonhole and are well on your way to reaching the top

Detail data is the foundation that data warehouses are built upon Users can ask any question anytime andconduct data mining OLAP ROLAP SQL and SPL functions build data marts directly from the detail dataand can easily maintain and grow the environment on a daily basis Now thats a tune well worth singing Makea note of it

Rule 5 - Build Data Marts from the Detail

You cannot teach a man anything you can only help him find it within himself

Galileo

Galileo was a smart man How did he know so much about life and data marts When we explained to Galileodata marts he said You cannot build a data mart directly from the OLTP systems you can only build a datamart directly from the detail within He was right

Many companies build data mart after data mart directly from the OLTP systems and their universe begins torevolve around continual maintenance Then as things get worse as Galileo predicted their universe begins torevolve around the son The son of a gun sent in to replace them

Why does this happen At first things work out great but soon there are more and more requests for additionalinformation As a result more and more data marts are created and soon the system looks like a giant spider web Different data marts start to yield different results on like data and the actual maintenance of thiscomplicated spider web takes up most of ITs time Meanwhile short-term dreams turn into long-termnightmares like this one A man and his wife had had a big argument just before he went on a business tripFeeling rather contrite about his harsh words he arranged to send his wife some flowers and asked the florist to

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 958

write on the card Im sorry I love you The beautiful bouquet arrived at the door But then his wife read thewords the florist had actually written in haste Im sorry I love you

The top reasons to build data marts directly from detail data are

bull Users can get answers from the data mart but must validate their findings or check out additionalinformation from the detail that built it

bull There is only one consistent version of the truthbull Maintenance is easy

If a user comes up with a data mart answer that does not make sense then he or she has the ability to drill downinto the detail and investigate Sometimes summary data can spark interest and finding out the why can resultin big bucks

If users dont trust the data they wont use the systemWhen a data warehouse is built on a foundation of detaildata and then data marts are erected from that foundation you have a winning combination The results willalways be consistent and trustworthy However you should only build data marts when there is a credible business case and you should be ready to drop them when they are no longer needed The life span of a datamart is relatively short to that of its mother and father (better known as the detail data) If you build the data

mart from the detail it makes them easy to manage easy to drop and easy to change

Rule 6 - Make Scalability Your Best Friend

Plan your life for a million tomorrows and live your life as if tomorrow may be your last

Morgan Jones

The roar of class-6 rapids on a river in Suriname can be almost deafening against the dense walls of the jungleEspecially when you are 9 years old Our mission was to lower our canoe down the waterfall with ropes TheTrio Amer-Indian who anchored our 40-foot dugout canoe let go of the anchor rope too quickly Without

warning the heavy boat began a freefall through the rocky water with my father hanging onto the side for dear life He disappeared under the rocky waters and I knew for sure we had lost him My heart pounded in againstmy chest As I rallied myself to grasp this loss as only a nine year old can the Indians abruptly began cheeringwildly above the roar of the river My dad had resurfaced a hundred yards downstream battered and bruised but he was alive In just one short minute I determined that I would love my family every day as if there wereno tomorrow

As I made my family my best friend a data warehouse must make scalability its best friend A data warehousethat does not scale will have no tomorrow It is only a matter of time until the warehouse disappears in rockywaters only to never come up for air Dont let go of the anchor rope

The data-warehousing environment will throw obstacles in your way every single day A data warehouse must be planned to meet todays needs But it must also be capable of meeting tomorrows challenges The futurecannot be predicted so plan for unlimited growth or linear scalability - - both vertical and horizontal There areso many data warehouses that start out with sizzling performance but as they grow they eventually andinevitably hit the scalability wall However before they hit the wall there is a pattern of diminishing performance

A data warehouse designed without scalability in mind is doomed before it is begun It can never reach its potential Take the scalability question out of the equation by investing in a database that allows you to startsmall but grows linearly

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1058

In todays fast paced world Gigabytes soon become Terabytes It may not sound like much but it weighs a tonon the shoulders of giants Listen to these measurements and pick your data warehouses life span For exampleif you lived for a million seconds (Megabyte) then you would live for 115 days In comparison if you lived fora billion seconds (Gigabyte) then you would live for 315 years Plus if you lived for a trillion seconds(Terabyte) then you would live for 31688 years

How nice it would be on your 31688th

birthday that people would say You sure look good for your age

Data warehouses hit the wall of scalability because they cannot grow with the same degree that the amount of data being gathered grows Teradata allows for unlimited linear scalability Linear Scalability is a building block approach to data warehousing that ensures that as building blocks are added the system continues at the

same performance level

This is why the largest data warehouses in the world use Teradata I was lucky to be in the right place at theright time and taught beginning stages at what are considered the two largest data warehouse sites in the worldSouth Western Bell (SBC) and Wal-Mart

Wal-Marts data warehouse started with less than 30 gigabytes and SBC started with less than 200 gigabytesand 100 users Both warehouses

bull Started small and simplebull Used Teradata from the beginningbull Have built the largest Enterprise Data Warehouse in their respective industriesbull Continue to realize additional Return On Investment (ROI) on an annual basisbull Have grown to more than 10 Terabytes of data and are still growingbull Have thousands of users (some estimates are shocking)bull Have educated and experienced data warehouse staffsbull Have educated and experienced data warehouse usersbull Experience continual growth without boundariesbull Have experienced linear performance by Teradata in every single upgrade (from gigabytes to terabytes

and from terabytes to tens of terabytes)bull Both companies are impressed with Teradatas power and performancebull And both SBC and Wal-Mart are committed to the excellence of Teradata

A data warehouse is built in small building blocks Linear Scalability is described in three ways

First building blocks are added until the performance requirements of your environment are met (GuaranteedSuccess)

Second every time the data doubles building blocks are doubled and the system maintains its performancelevel (Guaranteed Success) and

Third any time the environment changes building blocks are added until performance requirements are met(Guaranteed Success)

Scalability is not just about growing the data volume It also means growing or increasing the number of usersMany systems work flawlessly until as few as 5 users are added then they slow down to a crawl Companiesneed a system where growth and performance are easily calculated and implemented That means where thenumber of users size and complexity of queries volume of data and number of applications being used can becalculated and compared to the current systems actual size If more power speed or size is needed then thecompany can simply add building blocks to the system until the requirements are met

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1158

Rule 7 - Model the Data Correctly

You will find only what you bring in

Yoda Jedi Master in Star Wars

We model a database for the same reasons that Boeing builds an aircraft model to test flight characteristics in awind tunnel Its simpler and cheaper to model than to reconstruct the plane by iterations until you get it right

A proper data model should be designed to reflect the business components and possible relationships

Here are three rules for modeling data in a data warehouse

1 Model the data quickly2 Normalize the detail data3 Use a dimensional model for data marts

The 3rd Normal Form believes each column in a table should be directly related to the primary key the wholekey and nothing but the key Data is placed into tables where it makes the most sense and has no repeatinggroups derived data or optional columns This allows users to ask any question at any time on all data within

the enterprise Users do not have to strive for 3rd Normal Form but just normalize the data the best they canThere will be fewer columns in a table but a lot more tables overall This model is easier to maintain incrediblyflexible and allows a user to ask any question on any data at any time

A Star-Schema model is comprised of a fact table and a number of dimension tables The fact table is a tablewith a multi-part key Each element of the key is itself a foreign key to a single dimension table Theremaining fields in the fact table are known as facts and are numeric continuously valued and additive Factscan be thought of as measurements taken at the intersection of all of the dimensions Dimension attributes aremostly textual and are almost always the source of constraints and report breaks This model enhances performance on known queries or in other words queries users run repeatedly day after day

Most database modelers prefer to create a logical model in 3rd Normal Form but most database engines areovercome by physical limitations so they must compromise the model The four most difficult functions for adatabase to handle are

bull Join tablesbull Aggregate databull Sort databull Scan large volumes of data

In order to get around these system limitations vendors will suggest a model to avoid joins use summarizeddata to avoid aggregation store data in sorted order to avoid sorts and overuse indexes to avoid large scans

With these limitations vendors are also going to avoid being able to compete That is like placing a ball andchain around the runners leg and saying I wish you all the best in the marathon Come on Whose side arethese vendors really on

Teradata is the only database engine I have seen that has the power and maturity to use a 3rd Normal Form physical model on databases exceeding a terabyte in size Because of the physical limitations other databaseshave had to use a Star-Schema model to enhance performance but have given up on the ability to perform ad-hoc queries and data mining

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1258

A normalized model is one that should be used for the central data warehouse It allows users to ask anyquestion at any time on information from any place within the enterprise This is the central philosophy of adata warehouse It leads to the power of ad-hoc queries and data mining whereby advanced tools discover relationships that are not easily detected but do exist naturally in the business environment

A Star-Schema model enhances performance on known queries because we build our assumptions into themodel While these assumptions may be correct for the first application they may not be correct for othersFlexibility is a big issue but data marts can be dropped and added with relative ease if each is built directly

from the detail data

Remember build the data warehouse around detail data using a normalized model Then as query patternsemerge and performance for well-known queries becomes a priority Star Schema data marts can be created by extracting summarized or departmental data from the centralized data warehouse The user will then haveaccess to both the data marts for repetitive queries and the central warehouse for other queries

Because data marts can be an administrative nightmare Teradata enables Star-Schema access withoutrequiring physical data marts By setting up a join index as the intersection of your Star-Schema model youcan create a Star-Schema structure directly from your 3 rd Normal Form data model Best of all once it iscreated the data is automatically maintained as the underlying tables are updated

Keep in mind 80 of data warehouse queries are repetitive but 80 of the Return On Investment (ROI) isactually provided by the other 20 of the queries that go against detailed data in an iterative environment Byusing a normalized model for your central data warehouse and a Star-Schema model on data marts you canenhance the possibility of realizing an 80 Return on Investment and still enhance the performance on 80 of your queries

Rule 8 - Dont Let a Technical Issue Make Your Data Warehouse a Failure Statistic

Experience is a hard teacher because she gives the test first the lesson afterwards

Scottish Proverb

Did you know that 34th

of the people in the world hate fractions and that 40 of the time a data warehouse failsis because of a technical issue There are many traps and pitfalls in every data warehouse venture One winter day a hunter met a bear in the forest The bear said Im hungry I want a full stomach The man repliedWell Im cold I would like a fur coat Lets compromise said the bear and he quickly gobbled up thehunter They both got what they asked for The bear went away with a full belly and the man left wrapped in afur coat With that in mind good judgment comes from experience experience comes from bad judgment Youhave shown good judgment by reading this book so let our experience keep your company from having a baddata warehouse experience

Author Daniel Borsten wrote in The Discoverist The greatest obstacle to discovering the shape of the earththe continents and the oceans was not ignorance but rather the illusion of knowledge There is a lot of illusion of knowledge being spread around in the data-warehousing environment Before you decide on anydata warehouse product ask yourself and the vendor these questions

bull As my data demands increase will the system be able to physically load the data Our experience showsthat many systems are not capable of handling very large volumes of data Do the math

bull As the data grows in volume can the system meet the performance requirements Do the mathbull As the number of users grows will the system be able to scale Do the math

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1358

bull As my environment changes will the system be flexible enough to allow changes quickly and easilyDo the math

bull Will the system need so many Database Administrators (DBAs) that my systems cost skyrockets Dothe math

bull If we suddenly merged with another company and needed to incorporate into their mainframe or LANenvironment would the system be able to connect and include them Do the math

bull Can I continue to meet my batch window timeframes Do the mathbull Could I become the hero of the company one day only to have some technical glitch blamed on me

because of my poor foresight and be thrown out of the company into a giant mud puddle Do the bath

Rule 9 - Take a Building Block Approach

Be not afraid of growing slowly be afraid only of standing still

Chinese Proverb

Ever since Vasco de Balboa discovered the Pacific coast of Panama in 1513 kings and businessmen alikedreamed of the impossible to cut a waterway across the mountainous isthmus creating a shortcut between theAtlantic and Pacific Oceans Those dreams turned into reality during the Industrial Revolution It took almost

forty years of trial and error before the worlds greatest engineering feat since the Pyramids was completed in1914 Ships move through the locks of the canal rising 85 above sea level before they descend to the oppositeside Since its grand opening in 1920 the Panama Canal has revolutionized trans-oceanic traffic joining Eastand West Its 50-mile stretch saves every vessel about 8000 extra nautical miles of travel around the bottom tipof South America Several modifications have been engineered through the years to accommodate theincreasing size of ships

Data warehouses like the Panama Canal must be built over time and changed over time to meet new demandsA data warehouse must grow with the environment but the environment is unpredictable All sailors know thatthey cant direct the wind but that they can adjust their sails In comparison all data warehouse users know theycant direct the environment but they can adjust their warehouse Sometimes the data warehouse will grow

quickly and sometimes it will grow slowly but it should always be growing

So take a building block approach to data warehousing Teradata allows you to expand without boundaries -one building block at a time Plus adding on building blocks is easy

There are two aspects to a building block approach First you need to add applications to your data warehousein three to six month intervals Once the first application works then you are ready for more projects As you become more experienced with this approach you can add multiple projects in parallel by involving multipleorganizations

The second aspect of the building block approach is in the actual data warehouse architecture It doesnt matter

if yours is the smallest data warehouse in the world the largest or falls somewhere in between power andscalability always fuel success

Not long ago a customer flew out to San Diego for a Teradata demonstration and benchmark The benchmark ran late into the evening but the numbers were more than 50 better than the competition The customer wasextremely impressed but before buying he demanded to see the system scalability that everyone had beentalking about Although it was already late a Teradata employee was called in the middle of the night arrivedwithin 10 minutes (in pajamas) hooked up the building blocks and ran a utility called config She ran anothercalled reconfig and in less than two hours the system size doubled

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1458

As the environment changes in terms of users data complexity capacity batch windows time changes eventsor opportunities users should be able to continue building applications and architecture The more a Teradatasystem grows the more Teradata outshines the competition

Rule 10 - Buy a Teradata Data Warehouse

Men occasionally stumble over the truth but most of them pick themselves up and hurry off as if nothing hadhappened

Winston Churchill

Winston Churchill led Britain through World War II during what he called that countrys finest hour Whenusers see consistent data the system too is in its finest hour Teradata gives users the ability to ask questionsthey could never ask before Users trust Teradata because of its industry performance and reputation and because it never gives in Constant use gives users optimal business experience and no matter what a user asks the system responds with a hearty Yes Sir

When we explained Teradata to Churchill he said

A data WARe-house that consists of 250 Data marts is like poison and if I were the MIS departmentresponsible for maintaining them Id take it

Teradata guarantees an Enterprise Data Warehouse with no scalability issues Data loads like lightning andsystem administration is a breeze You can pick the performance level that meets your requirements for todayand forever The database can be normalized around detail data and because of Teradatas power users have theflexibility to ask any question at any time on any data

All other databases are suspect in data loading capabilities scalability reference sites decades of datawarehouse experience flexibility system administration difficulties and inability to handle the complex queriesof todays users These users are good

TeradatamdashThe Shining Star

Overview

Teradata has always been at the top of the data warehouse game even if the experts werent bright enough toknow it The incredible vision that the original designers had was tremendous It was so far to the left of geniusthat most thought the idea was impossible

Only he who attempts the ridiculous may achieve the impossible

Don Quixote

The Teradata database was originally designed in 1976 and many of the fundamental concepts still remaintoday Nearly 25 years later Teradata is still considered ahead of its time

In 1976 IBM mainframes dominated the computer business Everyone who was anyone had an IBMMainframe However the original founders of Teradata noticed that it took about 4 frac12 years for IBM to producea new mainframe They also noticed a little company called Intel Intel created a new PC chip every 2frac12 yearsWith mainframes moving forward every 4 frac12 years and PC chip every 2frac12 years Teradata recognized their

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 3: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 358

send a person to the moon or that someone could run a mile in under four minutes Ingenuity and the desire toimprove are attributes of the human race and both are found in numerous avenues from sports to business

Expect the unexpected or you wont find it

Roger von Oech

When Frank Lloyd Wright began to design the Imperial Hotel in Tokyo he discovered the unexpected just

eight feet below the surface of the ground lay a sixty-foot bed of soft mud Since Japan is a land of frequentshakes and tremors Wright was faced with what appeared to be an insurmountable obstacle This gave him anidea Why not float the Imperial Hotel building on the bed of mud and let it absorb the shock of any quakeCritics and cynics alike laughed at such an impossible idea Frank Lloyd Wright built the hotel anyway Shortlyafter the grand opening of the hotel Japan suffered its worst earthquake in fifty-two years All around Tokyo buildings were destroyed but the Imperial Hotel stood firm

For a long time the mainframe and OLTP industry laughed at those who recommended the data warehousedesign principles set forth in this book But those companies that build one based upon these rules will join theranks of the elite Consider this ten of the Top 13 global communications companies use Teradata nine of thetop 16 global retailers use Teradata and eight of the top 20 global banks use Teradata

The ability to continually improve is one of Teradatas greatest strengths The database was designed in 1976and has continually improved ever since Teradata has averaged one data warehouse installation per week for the past decade Through continual improvement based on customer feedback from many of the largest datawarehouse sites Teradata has been able to identify itself as the data warehouse of choice for award winningdata warehouses

This book begins with the 10 cardinal rules to follow for data warehouse success It illustrates how Teradatahelps customers follow these rules Then it explains the brilliance of how Teradata works By the end thereader will have a real grasp of essential Teradata concepts

Rule 1 - Start Building Towards A Central Data Warehouse

Moments after midnight on July 30 1945 the Navy cruiser USS Indianapolis suffered a fatal torpedo hitfrom a Japanese submarine It had been traveling unescorted through the Philippine Sea Within 12 minutes of the deadly hit the ship sank Over 300 men were killed and nearly 900 were stranded in shark-infested seasTragically those who survived until daylight faced four tortuous days in the water and battled continuous sharkattacks before being stumbled upon by a passing ship In the end only 316 souls survived With a crew of 1199 people this was one of the worst military disasters of World War II for the United States

Most people assume that war is cruel but the heart-wrenching story above becomes even more tragic when thefollowing facts are revealed First the ships captain did not have all of the facts and second the Navy did not

provide the captain with a single version of the truth The Captains request for a destroyer escort was deniedeven though the regional Naval command knew another ship had been attacked just two days earlier plusmultiple enemy sightings had occurred within the previous five days Not only were these crucially relevantfacts withheld but also the captain of the Indianapolis was told that his passage route was clear and therewould be no need for a destroyer escort

To withhold news is to play God

John Hess

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 458

Had everyone involved with the USS Indianapolis adhered to a single version of the truth with detail data to back them up this disaster may have never occurred Likewise if your company doesnt maintain detail data ina Centralized Data Warehouse you will never know which version of the truth to believe Each division of a business will have its own view of the truth Summarized data such as a data mart does have its place inknowledge management but it should always be built from the detail data within the central data warehouse

Most companies dont have a Central Data Warehouse Why Because they dont have proper leadership or direction Company leaders often let different branches of the company create data marts that are effective

short-term solutions These solutions are based on departmental leadership that is most interested in short-termsolutions Such leaders dont plan on being with a particular department forever so they are only interested inkeeping things simple controlled and beneficial to them

Were all in this alone

Lily Tomlin

For example imagine a company that made cars on an assembly line Instead of using a giant plant with thelatest and greatest technology the company builds cars in 300 small garages Each garage is owned by adifferent department and has different needs In addition every user has his access restricted to his or her

garage With this structure leaders feel safe but building cars logistically is a nightmare In fact just movingcars from one garage to the next would be a joke This scenario may seem simple-minded but that is how mostdata warehouses are built Each part of some data warehouses operates alone

Now imagine a giant car assembly plant where the assembly line was managed by the idea of There is no lsquoIrsquoin Team This plant would continually improve processes finding better ways to work together Everyone hasan idea what the others are doing and new ideas are welcome Management is able to run the entire plant withone team of dedicated professionals and decisions are made cooperatively concisely and clearly

This style of management is the idea behind a central data warehouse From the top layer of management downthrough the entire company they are one solid team A data warehouse experienced team saves valuable money

and resources plus users can manage the entire data warehouse Executives may ask any question targeted toany part of the business Decisions are made with long-term vision and every employee is confident that whenthey need answers - the data warehouse will provide them

If I have seen further it is by standing on the shoulders of giants

Isaac Newton

When asked how he had discovered the Law of Gravity Isaac Newton did not grab all of the glory for himselfHe claimed that his work stood on the foundation of those early scientists who had gone before him Likewise acentral data warehouse allows users to stand on the shoulders of another giant This giant built right allows

major corporations to make decisions and act on those decisions quickly

In 1993 I was asked to train one of the worlds largest retailers on its Teradata data warehouse I flew toBentonville Arkansas and an employee met me at the airport then escorted me to the classroom As we walkeddown the hallways most employees seemed to be at a pace I had never seen before They were practicallyrunning I asked Whats up Why is everyone hurrying The employee replied Its work time I wasshocked In all of places I had previously worked we strolled This place had a leadership that Ive never encounteredhellipanywhere H Ross Perot described this kind of team when he said When building a team I firstlook for people who love to win if I cant find any of those then I look for people who hate to lose This was aconcise team of employees so motivated and so empowered that they thought they could take over the world

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 558

As I grew to know the team I asked them how long it took top management to make a decision And how longdid it take to implement that decision at thousands of stores nationwide They simply said About two hours Iwas amazed Today this team continues to have one of the single greatest data warehouses ever built They useit extensively and it grows stronger every day

While visiting with this team management decided at one point that stores across the country should placeHalloween displays and candy near the cash registers In less than two hours stores moved their Halloweencandy from the normal candy aisles to end-caps near the cash register Every store participated but one

When asked why he didnt participate the store manager said he had simply run out of time to create thedisplays plus move the Halloween candy from his normal candy aisle to the end-caps Management was tickedTelling the manager they would get back to him they then asked the DBA to query the data warehouse to seehow much this snafu had cost the company The DBA came back and reported that the store actually soldalmost the same amount of Halloween candy as forecasted Management was surprised and honestly a littledisappointed with the answer But then the DBA added somewhat sheepishly I found something else tooGo ahead replied members of the management team He said I found out they actually sold about 40 morenormal candy then we forecasted for this holiday Management got on the phone immediately and told theother thousand stores Move those goblins and Halloween candy back to the normal candy aisles

What that DBA did was to use his instinct and the data warehouse to find out exactly what was going on withthe business at that time He was armed with a system that had cross-functional analysis A central datawarehouse gives good management great confidence because they see the whole picture When users can ask any question at any time and on any data their knowledge is unlimited

Most Teradata Central Data Warehouse sites will tell you most of their Return On Investment (ROI) came fromareas they never suspected Thomas Jefferson once said We dont know one millionth of a percent aboutanything When we explained Teradata to Jefferson he did not build another Monticello but he did retract hisstatement Companies with a centralized data warehouse know about a million percent more than companiesthat have invested in stovepipe applications and 300 different data marts

Actually any company planning on competing in this millennium must think long-term and begin building acentralized data warehouse If not that company will be on the short end of the stick when competing with acompany that chose to build one That thought should sound scarier than a goblin near the cash registers onHalloween

If you think about it every major decision in business makes someone happy If you are armed with factssupported by a central data warehouse and you do your homework your business decisions will make your shareholders happy However if you are making decisions with a data mart strategy those decisions are morelikely to make your competitors happy

There are many companies that are fearful of such an undertaking They want a central data warehouse but

wonder What if it fails Which database should we choose What type of hardware do we need Should wedo an RFP Decisions decisions It would literally take me about 30 seconds to make a decision on TeradataThere would be no RFP We used to wade in swimming pools of data today we are swamped in oceans of dataTeradata is built for this type of environment This book explains the fundamentals of Teradata Anyone withany experience or knowledge about data warehouse environments will clearly see why Teradata is the bestsolution

Rule 2 - Build for the User

A learned person is not one who gives the right answers it is the one who asks the right questions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 658

Claude Levi-Strauss

The user is the heart of the data warehouse and they get better with each day of experience The user makesdecisions that affect the companys bottom line Thats why the data warehouse is built around the business userBuilding a data warehouse is simple find out what data the business users need and what type of queries theywant to ask but are not able to ask today Then find out if the data is available and if the queries can be attainedWith those answers you will exceed usersrsquo expectations

An experienced data warehouse user is usually shocked when he or she first uses Teradata Its sheer power andflexibility enables users to ask questions they have never been able to ask before On a recent consultant trip of mine a young DBA got antsy when a particular query took more than a minute or so with Teradata So I askedWell how long did that same query take with your OLTP-based data warehouse He retorted We couldnteven run this query on the old system I said So whats wrong two minutes He added You know some ofour business users are so used to how long our queries used to run that they will be sitting staring at the screenwithout realizing that Teradata has already brought back the answer With Teradata users can expand their thinking by using intuition and keen business sense without technology barriers

The building of an enterprise data warehouse begins with top management but then cascades down to arelationship between the IT department and the business user community

The IT department must realize they have a supporting role That role is to please the business user by makingdata available so the business user can easily ask questions and get answers Its also the IT departments role to build a system that allows users to ask questions on their own without IT intervention Forget about building asystem where users ask IT to run the queries for them When users need information the IT department shouldeventually be able to say Ask the question yourselfhellipit is all available to you

The business users are actually the stars however the entire business community must take responsibility for the warehouses success These users must continually educate themselves and other users on the capabilities of the data warehouse new tools and new techniques that will enhance its potential Those same users must helpIT help them If both understand their respective roles and work together to help the company then the data

warehouse will be a huge success

Rule 3 - Let the IT Department Lead the Way to User Utopia

Few sports challenges are as grueling or demanding as the Tour de France But victory at this event eludedLance Armstrong a powerful young cyclist from Austin Texas Lance excelled in individual competition evenwinning the World Championships But despite his hard work Lance could not overcome the Europeansrsquo strongand proud tradition at the Tour de France A few years ago Lance was thrown into the battle of his life notagainst others but against himself He discovered that he had cancer and was given virtually no chance of surviving Suddenly he found out how little cycling really meant in life With all his might Lance battled hisway back to health beating the odds Now he found out how very much cycling could mean in life His bicycle

became a tool to reclaim the future He found a spot as a team member for the US Postal Service team With anew perspective and a new depth of character Lance led that team to victory in the next Tour de France And herepeated this victory again for the next two years

To win the premier event in the cycling world Lance Armstrong had to totally rethink his role In the sameway the key members of any company seeking success with its data warehouse must rethink their roles The ITdepartment plays a key role in a data warehouse What do users know about technical issues Not enough to build a data warehouse So technical issues are the responsibility of the IT department The danger with thistrain of thought is that while the IT department has years of experience with handling company transactionsthrough production databases and applications most are new at data warehousing A data-warehousing

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 758

environment can be extremely different than anything an IT department has ever built or used before Thereforeits a bad idea to build a data warehouse without the help of experienced people

An OLTP environment gets more and more predictable each month It is designed to be tweaked and tuned inorder to maximize a companys environment On the other hand a data warehouse is an unpredictableenvironment where the only way to gain control is to actually give up control In data warehousing the user must be allowed the freedom to ask the questions and they will blossom in an environment where flexibility isaccepted and welcomed

The only sure weapon against bad ideas is better ideas

A Whitney Griswold

If the IT department decides to build hundreds of data marts that will please each and every department thenthey are missing the boat Data warehouse experience is a hard teacher because it gives the test first and thelesson afterwards Abraham Lincoln once said A house divided cannot stand With that in mind build thedata warehouse so it will stand strong for a long time

Whats the formula First and foremost start by building your data warehouse around detail data Bring

transaction data along with key details from the OLTP systems into the data warehouse Then as knownqueries are identified build data marts to enhance their performance and also insist that data marts are createdand maintained directly from the detail data Doing so will build a foundation that will stand

Next the IT department needs to keep an open mind about creating an environment called User Utopia Haveyou ever been there In User Utopia the user confidently asks queries without fear of being charged by theminute The user has meta-data so he or she becomes intimate with the data then makes informed decisionsThe user should also be able to ask monster queries with the full backing of IT Recently on one such query theIT department wanted to pull the plug But the DBA held out granting the user more time When the queryfinished running the information it brought back from the detail data saved the company millions of dollarsOverall a user will get the majority of his or her answers back quickly from data marts but he or she also needs

the capability of going back to the detail data for more information This is User Utopia

Here is the message for IT Dont follow the idea that if you build it they will come Instead become a leader

hellip go to the users and build it together

Rule 4 - Build the Foundation Around Detail Data

Business is always trying to predict the unpredictable The US Air Force Reserves 53rd Weather Reconnaissance Squadron is a special force that flies their planes directly into tropical storms and hurricanesUsing a WC-130 Hercules aircraft they fly into storms at low altitudes between 1000 and 10000 feet takingweather readings that are relayed to the National Hurricane Center in Florida They measure wind speeds

measure the pressure and structure of the storm and most importantly locate the eye of the storm The datacollected by these Hurricane Hunters is used to determine when and where a storm might hit the coast andhow strong it will be at that time Teradata has no fear of detail data its virtual processors will fly right intothick of your data warehouse to bring back valuable information for decision support You see Teradata enablesyou to understand the storms in your business today while helping you predict when and where the next stormwill hit tomorrow

I estimate that 80 of todays data warehouses are built on summary (summarized) data Therefore 80 of all data warehouses will never come close to realizing their full potential Your data warehouse does not have to be one of them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 858

A bird does not sing because it has the answers it sings because it has a song

A data warehouse built on detail data does not sing because it has a song it sings because it has the answersWhen you capture detail data answers to an infinite amount of questions are available But if this is truly thecase then why doesnt everybody build around detail data Well there are two reasons One is price Like a bird many companies decide to go cheap cheap But watch out The real expense is not the cost of the datawarehouse it is the money that you will not make without one The second reason is power Many companiesdont have the wingspan to fly through the detail so they sore with the summary In addition some companies

dont want to pay for the disk space it actually takes to keep detail data but believe me that cost is a small priceto pay for success

Once you miss the first buttonhole it becomes difficult to button your shirt

Many companies use the same database for their data warehouse as they have done for their OLTP system Thisis a critical mistake In essence they have missed the first buttonhole and most likely will lose their shirt ontheir data warehouse adventure

At this point companies no longer have a choice of using detail data They must summarize for performancereasons As one marine told his boot camp soldiers jokingly The beatings will continue until the moral

improves Similarly a database designed for OLTP takes a continual beating when it tries to query largeamounts of detail data

Companies building true data warehouses dont compromise on price and will have a data warehouse that is built for decision support not one that specializes in OLTP With this decision you have buttoned the first buttonhole and are well on your way to reaching the top

Detail data is the foundation that data warehouses are built upon Users can ask any question anytime andconduct data mining OLAP ROLAP SQL and SPL functions build data marts directly from the detail dataand can easily maintain and grow the environment on a daily basis Now thats a tune well worth singing Makea note of it

Rule 5 - Build Data Marts from the Detail

You cannot teach a man anything you can only help him find it within himself

Galileo

Galileo was a smart man How did he know so much about life and data marts When we explained to Galileodata marts he said You cannot build a data mart directly from the OLTP systems you can only build a datamart directly from the detail within He was right

Many companies build data mart after data mart directly from the OLTP systems and their universe begins torevolve around continual maintenance Then as things get worse as Galileo predicted their universe begins torevolve around the son The son of a gun sent in to replace them

Why does this happen At first things work out great but soon there are more and more requests for additionalinformation As a result more and more data marts are created and soon the system looks like a giant spider web Different data marts start to yield different results on like data and the actual maintenance of thiscomplicated spider web takes up most of ITs time Meanwhile short-term dreams turn into long-termnightmares like this one A man and his wife had had a big argument just before he went on a business tripFeeling rather contrite about his harsh words he arranged to send his wife some flowers and asked the florist to

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 958

write on the card Im sorry I love you The beautiful bouquet arrived at the door But then his wife read thewords the florist had actually written in haste Im sorry I love you

The top reasons to build data marts directly from detail data are

bull Users can get answers from the data mart but must validate their findings or check out additionalinformation from the detail that built it

bull There is only one consistent version of the truthbull Maintenance is easy

If a user comes up with a data mart answer that does not make sense then he or she has the ability to drill downinto the detail and investigate Sometimes summary data can spark interest and finding out the why can resultin big bucks

If users dont trust the data they wont use the systemWhen a data warehouse is built on a foundation of detaildata and then data marts are erected from that foundation you have a winning combination The results willalways be consistent and trustworthy However you should only build data marts when there is a credible business case and you should be ready to drop them when they are no longer needed The life span of a datamart is relatively short to that of its mother and father (better known as the detail data) If you build the data

mart from the detail it makes them easy to manage easy to drop and easy to change

Rule 6 - Make Scalability Your Best Friend

Plan your life for a million tomorrows and live your life as if tomorrow may be your last

Morgan Jones

The roar of class-6 rapids on a river in Suriname can be almost deafening against the dense walls of the jungleEspecially when you are 9 years old Our mission was to lower our canoe down the waterfall with ropes TheTrio Amer-Indian who anchored our 40-foot dugout canoe let go of the anchor rope too quickly Without

warning the heavy boat began a freefall through the rocky water with my father hanging onto the side for dear life He disappeared under the rocky waters and I knew for sure we had lost him My heart pounded in againstmy chest As I rallied myself to grasp this loss as only a nine year old can the Indians abruptly began cheeringwildly above the roar of the river My dad had resurfaced a hundred yards downstream battered and bruised but he was alive In just one short minute I determined that I would love my family every day as if there wereno tomorrow

As I made my family my best friend a data warehouse must make scalability its best friend A data warehousethat does not scale will have no tomorrow It is only a matter of time until the warehouse disappears in rockywaters only to never come up for air Dont let go of the anchor rope

The data-warehousing environment will throw obstacles in your way every single day A data warehouse must be planned to meet todays needs But it must also be capable of meeting tomorrows challenges The futurecannot be predicted so plan for unlimited growth or linear scalability - - both vertical and horizontal There areso many data warehouses that start out with sizzling performance but as they grow they eventually andinevitably hit the scalability wall However before they hit the wall there is a pattern of diminishing performance

A data warehouse designed without scalability in mind is doomed before it is begun It can never reach its potential Take the scalability question out of the equation by investing in a database that allows you to startsmall but grows linearly

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1058

In todays fast paced world Gigabytes soon become Terabytes It may not sound like much but it weighs a tonon the shoulders of giants Listen to these measurements and pick your data warehouses life span For exampleif you lived for a million seconds (Megabyte) then you would live for 115 days In comparison if you lived fora billion seconds (Gigabyte) then you would live for 315 years Plus if you lived for a trillion seconds(Terabyte) then you would live for 31688 years

How nice it would be on your 31688th

birthday that people would say You sure look good for your age

Data warehouses hit the wall of scalability because they cannot grow with the same degree that the amount of data being gathered grows Teradata allows for unlimited linear scalability Linear Scalability is a building block approach to data warehousing that ensures that as building blocks are added the system continues at the

same performance level

This is why the largest data warehouses in the world use Teradata I was lucky to be in the right place at theright time and taught beginning stages at what are considered the two largest data warehouse sites in the worldSouth Western Bell (SBC) and Wal-Mart

Wal-Marts data warehouse started with less than 30 gigabytes and SBC started with less than 200 gigabytesand 100 users Both warehouses

bull Started small and simplebull Used Teradata from the beginningbull Have built the largest Enterprise Data Warehouse in their respective industriesbull Continue to realize additional Return On Investment (ROI) on an annual basisbull Have grown to more than 10 Terabytes of data and are still growingbull Have thousands of users (some estimates are shocking)bull Have educated and experienced data warehouse staffsbull Have educated and experienced data warehouse usersbull Experience continual growth without boundariesbull Have experienced linear performance by Teradata in every single upgrade (from gigabytes to terabytes

and from terabytes to tens of terabytes)bull Both companies are impressed with Teradatas power and performancebull And both SBC and Wal-Mart are committed to the excellence of Teradata

A data warehouse is built in small building blocks Linear Scalability is described in three ways

First building blocks are added until the performance requirements of your environment are met (GuaranteedSuccess)

Second every time the data doubles building blocks are doubled and the system maintains its performancelevel (Guaranteed Success) and

Third any time the environment changes building blocks are added until performance requirements are met(Guaranteed Success)

Scalability is not just about growing the data volume It also means growing or increasing the number of usersMany systems work flawlessly until as few as 5 users are added then they slow down to a crawl Companiesneed a system where growth and performance are easily calculated and implemented That means where thenumber of users size and complexity of queries volume of data and number of applications being used can becalculated and compared to the current systems actual size If more power speed or size is needed then thecompany can simply add building blocks to the system until the requirements are met

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1158

Rule 7 - Model the Data Correctly

You will find only what you bring in

Yoda Jedi Master in Star Wars

We model a database for the same reasons that Boeing builds an aircraft model to test flight characteristics in awind tunnel Its simpler and cheaper to model than to reconstruct the plane by iterations until you get it right

A proper data model should be designed to reflect the business components and possible relationships

Here are three rules for modeling data in a data warehouse

1 Model the data quickly2 Normalize the detail data3 Use a dimensional model for data marts

The 3rd Normal Form believes each column in a table should be directly related to the primary key the wholekey and nothing but the key Data is placed into tables where it makes the most sense and has no repeatinggroups derived data or optional columns This allows users to ask any question at any time on all data within

the enterprise Users do not have to strive for 3rd Normal Form but just normalize the data the best they canThere will be fewer columns in a table but a lot more tables overall This model is easier to maintain incrediblyflexible and allows a user to ask any question on any data at any time

A Star-Schema model is comprised of a fact table and a number of dimension tables The fact table is a tablewith a multi-part key Each element of the key is itself a foreign key to a single dimension table Theremaining fields in the fact table are known as facts and are numeric continuously valued and additive Factscan be thought of as measurements taken at the intersection of all of the dimensions Dimension attributes aremostly textual and are almost always the source of constraints and report breaks This model enhances performance on known queries or in other words queries users run repeatedly day after day

Most database modelers prefer to create a logical model in 3rd Normal Form but most database engines areovercome by physical limitations so they must compromise the model The four most difficult functions for adatabase to handle are

bull Join tablesbull Aggregate databull Sort databull Scan large volumes of data

In order to get around these system limitations vendors will suggest a model to avoid joins use summarizeddata to avoid aggregation store data in sorted order to avoid sorts and overuse indexes to avoid large scans

With these limitations vendors are also going to avoid being able to compete That is like placing a ball andchain around the runners leg and saying I wish you all the best in the marathon Come on Whose side arethese vendors really on

Teradata is the only database engine I have seen that has the power and maturity to use a 3rd Normal Form physical model on databases exceeding a terabyte in size Because of the physical limitations other databaseshave had to use a Star-Schema model to enhance performance but have given up on the ability to perform ad-hoc queries and data mining

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1258

A normalized model is one that should be used for the central data warehouse It allows users to ask anyquestion at any time on information from any place within the enterprise This is the central philosophy of adata warehouse It leads to the power of ad-hoc queries and data mining whereby advanced tools discover relationships that are not easily detected but do exist naturally in the business environment

A Star-Schema model enhances performance on known queries because we build our assumptions into themodel While these assumptions may be correct for the first application they may not be correct for othersFlexibility is a big issue but data marts can be dropped and added with relative ease if each is built directly

from the detail data

Remember build the data warehouse around detail data using a normalized model Then as query patternsemerge and performance for well-known queries becomes a priority Star Schema data marts can be created by extracting summarized or departmental data from the centralized data warehouse The user will then haveaccess to both the data marts for repetitive queries and the central warehouse for other queries

Because data marts can be an administrative nightmare Teradata enables Star-Schema access withoutrequiring physical data marts By setting up a join index as the intersection of your Star-Schema model youcan create a Star-Schema structure directly from your 3 rd Normal Form data model Best of all once it iscreated the data is automatically maintained as the underlying tables are updated

Keep in mind 80 of data warehouse queries are repetitive but 80 of the Return On Investment (ROI) isactually provided by the other 20 of the queries that go against detailed data in an iterative environment Byusing a normalized model for your central data warehouse and a Star-Schema model on data marts you canenhance the possibility of realizing an 80 Return on Investment and still enhance the performance on 80 of your queries

Rule 8 - Dont Let a Technical Issue Make Your Data Warehouse a Failure Statistic

Experience is a hard teacher because she gives the test first the lesson afterwards

Scottish Proverb

Did you know that 34th

of the people in the world hate fractions and that 40 of the time a data warehouse failsis because of a technical issue There are many traps and pitfalls in every data warehouse venture One winter day a hunter met a bear in the forest The bear said Im hungry I want a full stomach The man repliedWell Im cold I would like a fur coat Lets compromise said the bear and he quickly gobbled up thehunter They both got what they asked for The bear went away with a full belly and the man left wrapped in afur coat With that in mind good judgment comes from experience experience comes from bad judgment Youhave shown good judgment by reading this book so let our experience keep your company from having a baddata warehouse experience

Author Daniel Borsten wrote in The Discoverist The greatest obstacle to discovering the shape of the earththe continents and the oceans was not ignorance but rather the illusion of knowledge There is a lot of illusion of knowledge being spread around in the data-warehousing environment Before you decide on anydata warehouse product ask yourself and the vendor these questions

bull As my data demands increase will the system be able to physically load the data Our experience showsthat many systems are not capable of handling very large volumes of data Do the math

bull As the data grows in volume can the system meet the performance requirements Do the mathbull As the number of users grows will the system be able to scale Do the math

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1358

bull As my environment changes will the system be flexible enough to allow changes quickly and easilyDo the math

bull Will the system need so many Database Administrators (DBAs) that my systems cost skyrockets Dothe math

bull If we suddenly merged with another company and needed to incorporate into their mainframe or LANenvironment would the system be able to connect and include them Do the math

bull Can I continue to meet my batch window timeframes Do the mathbull Could I become the hero of the company one day only to have some technical glitch blamed on me

because of my poor foresight and be thrown out of the company into a giant mud puddle Do the bath

Rule 9 - Take a Building Block Approach

Be not afraid of growing slowly be afraid only of standing still

Chinese Proverb

Ever since Vasco de Balboa discovered the Pacific coast of Panama in 1513 kings and businessmen alikedreamed of the impossible to cut a waterway across the mountainous isthmus creating a shortcut between theAtlantic and Pacific Oceans Those dreams turned into reality during the Industrial Revolution It took almost

forty years of trial and error before the worlds greatest engineering feat since the Pyramids was completed in1914 Ships move through the locks of the canal rising 85 above sea level before they descend to the oppositeside Since its grand opening in 1920 the Panama Canal has revolutionized trans-oceanic traffic joining Eastand West Its 50-mile stretch saves every vessel about 8000 extra nautical miles of travel around the bottom tipof South America Several modifications have been engineered through the years to accommodate theincreasing size of ships

Data warehouses like the Panama Canal must be built over time and changed over time to meet new demandsA data warehouse must grow with the environment but the environment is unpredictable All sailors know thatthey cant direct the wind but that they can adjust their sails In comparison all data warehouse users know theycant direct the environment but they can adjust their warehouse Sometimes the data warehouse will grow

quickly and sometimes it will grow slowly but it should always be growing

So take a building block approach to data warehousing Teradata allows you to expand without boundaries -one building block at a time Plus adding on building blocks is easy

There are two aspects to a building block approach First you need to add applications to your data warehousein three to six month intervals Once the first application works then you are ready for more projects As you become more experienced with this approach you can add multiple projects in parallel by involving multipleorganizations

The second aspect of the building block approach is in the actual data warehouse architecture It doesnt matter

if yours is the smallest data warehouse in the world the largest or falls somewhere in between power andscalability always fuel success

Not long ago a customer flew out to San Diego for a Teradata demonstration and benchmark The benchmark ran late into the evening but the numbers were more than 50 better than the competition The customer wasextremely impressed but before buying he demanded to see the system scalability that everyone had beentalking about Although it was already late a Teradata employee was called in the middle of the night arrivedwithin 10 minutes (in pajamas) hooked up the building blocks and ran a utility called config She ran anothercalled reconfig and in less than two hours the system size doubled

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1458

As the environment changes in terms of users data complexity capacity batch windows time changes eventsor opportunities users should be able to continue building applications and architecture The more a Teradatasystem grows the more Teradata outshines the competition

Rule 10 - Buy a Teradata Data Warehouse

Men occasionally stumble over the truth but most of them pick themselves up and hurry off as if nothing hadhappened

Winston Churchill

Winston Churchill led Britain through World War II during what he called that countrys finest hour Whenusers see consistent data the system too is in its finest hour Teradata gives users the ability to ask questionsthey could never ask before Users trust Teradata because of its industry performance and reputation and because it never gives in Constant use gives users optimal business experience and no matter what a user asks the system responds with a hearty Yes Sir

When we explained Teradata to Churchill he said

A data WARe-house that consists of 250 Data marts is like poison and if I were the MIS departmentresponsible for maintaining them Id take it

Teradata guarantees an Enterprise Data Warehouse with no scalability issues Data loads like lightning andsystem administration is a breeze You can pick the performance level that meets your requirements for todayand forever The database can be normalized around detail data and because of Teradatas power users have theflexibility to ask any question at any time on any data

All other databases are suspect in data loading capabilities scalability reference sites decades of datawarehouse experience flexibility system administration difficulties and inability to handle the complex queriesof todays users These users are good

TeradatamdashThe Shining Star

Overview

Teradata has always been at the top of the data warehouse game even if the experts werent bright enough toknow it The incredible vision that the original designers had was tremendous It was so far to the left of geniusthat most thought the idea was impossible

Only he who attempts the ridiculous may achieve the impossible

Don Quixote

The Teradata database was originally designed in 1976 and many of the fundamental concepts still remaintoday Nearly 25 years later Teradata is still considered ahead of its time

In 1976 IBM mainframes dominated the computer business Everyone who was anyone had an IBMMainframe However the original founders of Teradata noticed that it took about 4 frac12 years for IBM to producea new mainframe They also noticed a little company called Intel Intel created a new PC chip every 2frac12 yearsWith mainframes moving forward every 4 frac12 years and PC chip every 2frac12 years Teradata recognized their

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 4: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 458

Had everyone involved with the USS Indianapolis adhered to a single version of the truth with detail data to back them up this disaster may have never occurred Likewise if your company doesnt maintain detail data ina Centralized Data Warehouse you will never know which version of the truth to believe Each division of a business will have its own view of the truth Summarized data such as a data mart does have its place inknowledge management but it should always be built from the detail data within the central data warehouse

Most companies dont have a Central Data Warehouse Why Because they dont have proper leadership or direction Company leaders often let different branches of the company create data marts that are effective

short-term solutions These solutions are based on departmental leadership that is most interested in short-termsolutions Such leaders dont plan on being with a particular department forever so they are only interested inkeeping things simple controlled and beneficial to them

Were all in this alone

Lily Tomlin

For example imagine a company that made cars on an assembly line Instead of using a giant plant with thelatest and greatest technology the company builds cars in 300 small garages Each garage is owned by adifferent department and has different needs In addition every user has his access restricted to his or her

garage With this structure leaders feel safe but building cars logistically is a nightmare In fact just movingcars from one garage to the next would be a joke This scenario may seem simple-minded but that is how mostdata warehouses are built Each part of some data warehouses operates alone

Now imagine a giant car assembly plant where the assembly line was managed by the idea of There is no lsquoIrsquoin Team This plant would continually improve processes finding better ways to work together Everyone hasan idea what the others are doing and new ideas are welcome Management is able to run the entire plant withone team of dedicated professionals and decisions are made cooperatively concisely and clearly

This style of management is the idea behind a central data warehouse From the top layer of management downthrough the entire company they are one solid team A data warehouse experienced team saves valuable money

and resources plus users can manage the entire data warehouse Executives may ask any question targeted toany part of the business Decisions are made with long-term vision and every employee is confident that whenthey need answers - the data warehouse will provide them

If I have seen further it is by standing on the shoulders of giants

Isaac Newton

When asked how he had discovered the Law of Gravity Isaac Newton did not grab all of the glory for himselfHe claimed that his work stood on the foundation of those early scientists who had gone before him Likewise acentral data warehouse allows users to stand on the shoulders of another giant This giant built right allows

major corporations to make decisions and act on those decisions quickly

In 1993 I was asked to train one of the worlds largest retailers on its Teradata data warehouse I flew toBentonville Arkansas and an employee met me at the airport then escorted me to the classroom As we walkeddown the hallways most employees seemed to be at a pace I had never seen before They were practicallyrunning I asked Whats up Why is everyone hurrying The employee replied Its work time I wasshocked In all of places I had previously worked we strolled This place had a leadership that Ive never encounteredhellipanywhere H Ross Perot described this kind of team when he said When building a team I firstlook for people who love to win if I cant find any of those then I look for people who hate to lose This was aconcise team of employees so motivated and so empowered that they thought they could take over the world

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 558

As I grew to know the team I asked them how long it took top management to make a decision And how longdid it take to implement that decision at thousands of stores nationwide They simply said About two hours Iwas amazed Today this team continues to have one of the single greatest data warehouses ever built They useit extensively and it grows stronger every day

While visiting with this team management decided at one point that stores across the country should placeHalloween displays and candy near the cash registers In less than two hours stores moved their Halloweencandy from the normal candy aisles to end-caps near the cash register Every store participated but one

When asked why he didnt participate the store manager said he had simply run out of time to create thedisplays plus move the Halloween candy from his normal candy aisle to the end-caps Management was tickedTelling the manager they would get back to him they then asked the DBA to query the data warehouse to seehow much this snafu had cost the company The DBA came back and reported that the store actually soldalmost the same amount of Halloween candy as forecasted Management was surprised and honestly a littledisappointed with the answer But then the DBA added somewhat sheepishly I found something else tooGo ahead replied members of the management team He said I found out they actually sold about 40 morenormal candy then we forecasted for this holiday Management got on the phone immediately and told theother thousand stores Move those goblins and Halloween candy back to the normal candy aisles

What that DBA did was to use his instinct and the data warehouse to find out exactly what was going on withthe business at that time He was armed with a system that had cross-functional analysis A central datawarehouse gives good management great confidence because they see the whole picture When users can ask any question at any time and on any data their knowledge is unlimited

Most Teradata Central Data Warehouse sites will tell you most of their Return On Investment (ROI) came fromareas they never suspected Thomas Jefferson once said We dont know one millionth of a percent aboutanything When we explained Teradata to Jefferson he did not build another Monticello but he did retract hisstatement Companies with a centralized data warehouse know about a million percent more than companiesthat have invested in stovepipe applications and 300 different data marts

Actually any company planning on competing in this millennium must think long-term and begin building acentralized data warehouse If not that company will be on the short end of the stick when competing with acompany that chose to build one That thought should sound scarier than a goblin near the cash registers onHalloween

If you think about it every major decision in business makes someone happy If you are armed with factssupported by a central data warehouse and you do your homework your business decisions will make your shareholders happy However if you are making decisions with a data mart strategy those decisions are morelikely to make your competitors happy

There are many companies that are fearful of such an undertaking They want a central data warehouse but

wonder What if it fails Which database should we choose What type of hardware do we need Should wedo an RFP Decisions decisions It would literally take me about 30 seconds to make a decision on TeradataThere would be no RFP We used to wade in swimming pools of data today we are swamped in oceans of dataTeradata is built for this type of environment This book explains the fundamentals of Teradata Anyone withany experience or knowledge about data warehouse environments will clearly see why Teradata is the bestsolution

Rule 2 - Build for the User

A learned person is not one who gives the right answers it is the one who asks the right questions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 658

Claude Levi-Strauss

The user is the heart of the data warehouse and they get better with each day of experience The user makesdecisions that affect the companys bottom line Thats why the data warehouse is built around the business userBuilding a data warehouse is simple find out what data the business users need and what type of queries theywant to ask but are not able to ask today Then find out if the data is available and if the queries can be attainedWith those answers you will exceed usersrsquo expectations

An experienced data warehouse user is usually shocked when he or she first uses Teradata Its sheer power andflexibility enables users to ask questions they have never been able to ask before On a recent consultant trip of mine a young DBA got antsy when a particular query took more than a minute or so with Teradata So I askedWell how long did that same query take with your OLTP-based data warehouse He retorted We couldnteven run this query on the old system I said So whats wrong two minutes He added You know some ofour business users are so used to how long our queries used to run that they will be sitting staring at the screenwithout realizing that Teradata has already brought back the answer With Teradata users can expand their thinking by using intuition and keen business sense without technology barriers

The building of an enterprise data warehouse begins with top management but then cascades down to arelationship between the IT department and the business user community

The IT department must realize they have a supporting role That role is to please the business user by makingdata available so the business user can easily ask questions and get answers Its also the IT departments role to build a system that allows users to ask questions on their own without IT intervention Forget about building asystem where users ask IT to run the queries for them When users need information the IT department shouldeventually be able to say Ask the question yourselfhellipit is all available to you

The business users are actually the stars however the entire business community must take responsibility for the warehouses success These users must continually educate themselves and other users on the capabilities of the data warehouse new tools and new techniques that will enhance its potential Those same users must helpIT help them If both understand their respective roles and work together to help the company then the data

warehouse will be a huge success

Rule 3 - Let the IT Department Lead the Way to User Utopia

Few sports challenges are as grueling or demanding as the Tour de France But victory at this event eludedLance Armstrong a powerful young cyclist from Austin Texas Lance excelled in individual competition evenwinning the World Championships But despite his hard work Lance could not overcome the Europeansrsquo strongand proud tradition at the Tour de France A few years ago Lance was thrown into the battle of his life notagainst others but against himself He discovered that he had cancer and was given virtually no chance of surviving Suddenly he found out how little cycling really meant in life With all his might Lance battled hisway back to health beating the odds Now he found out how very much cycling could mean in life His bicycle

became a tool to reclaim the future He found a spot as a team member for the US Postal Service team With anew perspective and a new depth of character Lance led that team to victory in the next Tour de France And herepeated this victory again for the next two years

To win the premier event in the cycling world Lance Armstrong had to totally rethink his role In the sameway the key members of any company seeking success with its data warehouse must rethink their roles The ITdepartment plays a key role in a data warehouse What do users know about technical issues Not enough to build a data warehouse So technical issues are the responsibility of the IT department The danger with thistrain of thought is that while the IT department has years of experience with handling company transactionsthrough production databases and applications most are new at data warehousing A data-warehousing

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 758

environment can be extremely different than anything an IT department has ever built or used before Thereforeits a bad idea to build a data warehouse without the help of experienced people

An OLTP environment gets more and more predictable each month It is designed to be tweaked and tuned inorder to maximize a companys environment On the other hand a data warehouse is an unpredictableenvironment where the only way to gain control is to actually give up control In data warehousing the user must be allowed the freedom to ask the questions and they will blossom in an environment where flexibility isaccepted and welcomed

The only sure weapon against bad ideas is better ideas

A Whitney Griswold

If the IT department decides to build hundreds of data marts that will please each and every department thenthey are missing the boat Data warehouse experience is a hard teacher because it gives the test first and thelesson afterwards Abraham Lincoln once said A house divided cannot stand With that in mind build thedata warehouse so it will stand strong for a long time

Whats the formula First and foremost start by building your data warehouse around detail data Bring

transaction data along with key details from the OLTP systems into the data warehouse Then as knownqueries are identified build data marts to enhance their performance and also insist that data marts are createdand maintained directly from the detail data Doing so will build a foundation that will stand

Next the IT department needs to keep an open mind about creating an environment called User Utopia Haveyou ever been there In User Utopia the user confidently asks queries without fear of being charged by theminute The user has meta-data so he or she becomes intimate with the data then makes informed decisionsThe user should also be able to ask monster queries with the full backing of IT Recently on one such query theIT department wanted to pull the plug But the DBA held out granting the user more time When the queryfinished running the information it brought back from the detail data saved the company millions of dollarsOverall a user will get the majority of his or her answers back quickly from data marts but he or she also needs

the capability of going back to the detail data for more information This is User Utopia

Here is the message for IT Dont follow the idea that if you build it they will come Instead become a leader

hellip go to the users and build it together

Rule 4 - Build the Foundation Around Detail Data

Business is always trying to predict the unpredictable The US Air Force Reserves 53rd Weather Reconnaissance Squadron is a special force that flies their planes directly into tropical storms and hurricanesUsing a WC-130 Hercules aircraft they fly into storms at low altitudes between 1000 and 10000 feet takingweather readings that are relayed to the National Hurricane Center in Florida They measure wind speeds

measure the pressure and structure of the storm and most importantly locate the eye of the storm The datacollected by these Hurricane Hunters is used to determine when and where a storm might hit the coast andhow strong it will be at that time Teradata has no fear of detail data its virtual processors will fly right intothick of your data warehouse to bring back valuable information for decision support You see Teradata enablesyou to understand the storms in your business today while helping you predict when and where the next stormwill hit tomorrow

I estimate that 80 of todays data warehouses are built on summary (summarized) data Therefore 80 of all data warehouses will never come close to realizing their full potential Your data warehouse does not have to be one of them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 858

A bird does not sing because it has the answers it sings because it has a song

A data warehouse built on detail data does not sing because it has a song it sings because it has the answersWhen you capture detail data answers to an infinite amount of questions are available But if this is truly thecase then why doesnt everybody build around detail data Well there are two reasons One is price Like a bird many companies decide to go cheap cheap But watch out The real expense is not the cost of the datawarehouse it is the money that you will not make without one The second reason is power Many companiesdont have the wingspan to fly through the detail so they sore with the summary In addition some companies

dont want to pay for the disk space it actually takes to keep detail data but believe me that cost is a small priceto pay for success

Once you miss the first buttonhole it becomes difficult to button your shirt

Many companies use the same database for their data warehouse as they have done for their OLTP system Thisis a critical mistake In essence they have missed the first buttonhole and most likely will lose their shirt ontheir data warehouse adventure

At this point companies no longer have a choice of using detail data They must summarize for performancereasons As one marine told his boot camp soldiers jokingly The beatings will continue until the moral

improves Similarly a database designed for OLTP takes a continual beating when it tries to query largeamounts of detail data

Companies building true data warehouses dont compromise on price and will have a data warehouse that is built for decision support not one that specializes in OLTP With this decision you have buttoned the first buttonhole and are well on your way to reaching the top

Detail data is the foundation that data warehouses are built upon Users can ask any question anytime andconduct data mining OLAP ROLAP SQL and SPL functions build data marts directly from the detail dataand can easily maintain and grow the environment on a daily basis Now thats a tune well worth singing Makea note of it

Rule 5 - Build Data Marts from the Detail

You cannot teach a man anything you can only help him find it within himself

Galileo

Galileo was a smart man How did he know so much about life and data marts When we explained to Galileodata marts he said You cannot build a data mart directly from the OLTP systems you can only build a datamart directly from the detail within He was right

Many companies build data mart after data mart directly from the OLTP systems and their universe begins torevolve around continual maintenance Then as things get worse as Galileo predicted their universe begins torevolve around the son The son of a gun sent in to replace them

Why does this happen At first things work out great but soon there are more and more requests for additionalinformation As a result more and more data marts are created and soon the system looks like a giant spider web Different data marts start to yield different results on like data and the actual maintenance of thiscomplicated spider web takes up most of ITs time Meanwhile short-term dreams turn into long-termnightmares like this one A man and his wife had had a big argument just before he went on a business tripFeeling rather contrite about his harsh words he arranged to send his wife some flowers and asked the florist to

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 958

write on the card Im sorry I love you The beautiful bouquet arrived at the door But then his wife read thewords the florist had actually written in haste Im sorry I love you

The top reasons to build data marts directly from detail data are

bull Users can get answers from the data mart but must validate their findings or check out additionalinformation from the detail that built it

bull There is only one consistent version of the truthbull Maintenance is easy

If a user comes up with a data mart answer that does not make sense then he or she has the ability to drill downinto the detail and investigate Sometimes summary data can spark interest and finding out the why can resultin big bucks

If users dont trust the data they wont use the systemWhen a data warehouse is built on a foundation of detaildata and then data marts are erected from that foundation you have a winning combination The results willalways be consistent and trustworthy However you should only build data marts when there is a credible business case and you should be ready to drop them when they are no longer needed The life span of a datamart is relatively short to that of its mother and father (better known as the detail data) If you build the data

mart from the detail it makes them easy to manage easy to drop and easy to change

Rule 6 - Make Scalability Your Best Friend

Plan your life for a million tomorrows and live your life as if tomorrow may be your last

Morgan Jones

The roar of class-6 rapids on a river in Suriname can be almost deafening against the dense walls of the jungleEspecially when you are 9 years old Our mission was to lower our canoe down the waterfall with ropes TheTrio Amer-Indian who anchored our 40-foot dugout canoe let go of the anchor rope too quickly Without

warning the heavy boat began a freefall through the rocky water with my father hanging onto the side for dear life He disappeared under the rocky waters and I knew for sure we had lost him My heart pounded in againstmy chest As I rallied myself to grasp this loss as only a nine year old can the Indians abruptly began cheeringwildly above the roar of the river My dad had resurfaced a hundred yards downstream battered and bruised but he was alive In just one short minute I determined that I would love my family every day as if there wereno tomorrow

As I made my family my best friend a data warehouse must make scalability its best friend A data warehousethat does not scale will have no tomorrow It is only a matter of time until the warehouse disappears in rockywaters only to never come up for air Dont let go of the anchor rope

The data-warehousing environment will throw obstacles in your way every single day A data warehouse must be planned to meet todays needs But it must also be capable of meeting tomorrows challenges The futurecannot be predicted so plan for unlimited growth or linear scalability - - both vertical and horizontal There areso many data warehouses that start out with sizzling performance but as they grow they eventually andinevitably hit the scalability wall However before they hit the wall there is a pattern of diminishing performance

A data warehouse designed without scalability in mind is doomed before it is begun It can never reach its potential Take the scalability question out of the equation by investing in a database that allows you to startsmall but grows linearly

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1058

In todays fast paced world Gigabytes soon become Terabytes It may not sound like much but it weighs a tonon the shoulders of giants Listen to these measurements and pick your data warehouses life span For exampleif you lived for a million seconds (Megabyte) then you would live for 115 days In comparison if you lived fora billion seconds (Gigabyte) then you would live for 315 years Plus if you lived for a trillion seconds(Terabyte) then you would live for 31688 years

How nice it would be on your 31688th

birthday that people would say You sure look good for your age

Data warehouses hit the wall of scalability because they cannot grow with the same degree that the amount of data being gathered grows Teradata allows for unlimited linear scalability Linear Scalability is a building block approach to data warehousing that ensures that as building blocks are added the system continues at the

same performance level

This is why the largest data warehouses in the world use Teradata I was lucky to be in the right place at theright time and taught beginning stages at what are considered the two largest data warehouse sites in the worldSouth Western Bell (SBC) and Wal-Mart

Wal-Marts data warehouse started with less than 30 gigabytes and SBC started with less than 200 gigabytesand 100 users Both warehouses

bull Started small and simplebull Used Teradata from the beginningbull Have built the largest Enterprise Data Warehouse in their respective industriesbull Continue to realize additional Return On Investment (ROI) on an annual basisbull Have grown to more than 10 Terabytes of data and are still growingbull Have thousands of users (some estimates are shocking)bull Have educated and experienced data warehouse staffsbull Have educated and experienced data warehouse usersbull Experience continual growth without boundariesbull Have experienced linear performance by Teradata in every single upgrade (from gigabytes to terabytes

and from terabytes to tens of terabytes)bull Both companies are impressed with Teradatas power and performancebull And both SBC and Wal-Mart are committed to the excellence of Teradata

A data warehouse is built in small building blocks Linear Scalability is described in three ways

First building blocks are added until the performance requirements of your environment are met (GuaranteedSuccess)

Second every time the data doubles building blocks are doubled and the system maintains its performancelevel (Guaranteed Success) and

Third any time the environment changes building blocks are added until performance requirements are met(Guaranteed Success)

Scalability is not just about growing the data volume It also means growing or increasing the number of usersMany systems work flawlessly until as few as 5 users are added then they slow down to a crawl Companiesneed a system where growth and performance are easily calculated and implemented That means where thenumber of users size and complexity of queries volume of data and number of applications being used can becalculated and compared to the current systems actual size If more power speed or size is needed then thecompany can simply add building blocks to the system until the requirements are met

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1158

Rule 7 - Model the Data Correctly

You will find only what you bring in

Yoda Jedi Master in Star Wars

We model a database for the same reasons that Boeing builds an aircraft model to test flight characteristics in awind tunnel Its simpler and cheaper to model than to reconstruct the plane by iterations until you get it right

A proper data model should be designed to reflect the business components and possible relationships

Here are three rules for modeling data in a data warehouse

1 Model the data quickly2 Normalize the detail data3 Use a dimensional model for data marts

The 3rd Normal Form believes each column in a table should be directly related to the primary key the wholekey and nothing but the key Data is placed into tables where it makes the most sense and has no repeatinggroups derived data or optional columns This allows users to ask any question at any time on all data within

the enterprise Users do not have to strive for 3rd Normal Form but just normalize the data the best they canThere will be fewer columns in a table but a lot more tables overall This model is easier to maintain incrediblyflexible and allows a user to ask any question on any data at any time

A Star-Schema model is comprised of a fact table and a number of dimension tables The fact table is a tablewith a multi-part key Each element of the key is itself a foreign key to a single dimension table Theremaining fields in the fact table are known as facts and are numeric continuously valued and additive Factscan be thought of as measurements taken at the intersection of all of the dimensions Dimension attributes aremostly textual and are almost always the source of constraints and report breaks This model enhances performance on known queries or in other words queries users run repeatedly day after day

Most database modelers prefer to create a logical model in 3rd Normal Form but most database engines areovercome by physical limitations so they must compromise the model The four most difficult functions for adatabase to handle are

bull Join tablesbull Aggregate databull Sort databull Scan large volumes of data

In order to get around these system limitations vendors will suggest a model to avoid joins use summarizeddata to avoid aggregation store data in sorted order to avoid sorts and overuse indexes to avoid large scans

With these limitations vendors are also going to avoid being able to compete That is like placing a ball andchain around the runners leg and saying I wish you all the best in the marathon Come on Whose side arethese vendors really on

Teradata is the only database engine I have seen that has the power and maturity to use a 3rd Normal Form physical model on databases exceeding a terabyte in size Because of the physical limitations other databaseshave had to use a Star-Schema model to enhance performance but have given up on the ability to perform ad-hoc queries and data mining

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1258

A normalized model is one that should be used for the central data warehouse It allows users to ask anyquestion at any time on information from any place within the enterprise This is the central philosophy of adata warehouse It leads to the power of ad-hoc queries and data mining whereby advanced tools discover relationships that are not easily detected but do exist naturally in the business environment

A Star-Schema model enhances performance on known queries because we build our assumptions into themodel While these assumptions may be correct for the first application they may not be correct for othersFlexibility is a big issue but data marts can be dropped and added with relative ease if each is built directly

from the detail data

Remember build the data warehouse around detail data using a normalized model Then as query patternsemerge and performance for well-known queries becomes a priority Star Schema data marts can be created by extracting summarized or departmental data from the centralized data warehouse The user will then haveaccess to both the data marts for repetitive queries and the central warehouse for other queries

Because data marts can be an administrative nightmare Teradata enables Star-Schema access withoutrequiring physical data marts By setting up a join index as the intersection of your Star-Schema model youcan create a Star-Schema structure directly from your 3 rd Normal Form data model Best of all once it iscreated the data is automatically maintained as the underlying tables are updated

Keep in mind 80 of data warehouse queries are repetitive but 80 of the Return On Investment (ROI) isactually provided by the other 20 of the queries that go against detailed data in an iterative environment Byusing a normalized model for your central data warehouse and a Star-Schema model on data marts you canenhance the possibility of realizing an 80 Return on Investment and still enhance the performance on 80 of your queries

Rule 8 - Dont Let a Technical Issue Make Your Data Warehouse a Failure Statistic

Experience is a hard teacher because she gives the test first the lesson afterwards

Scottish Proverb

Did you know that 34th

of the people in the world hate fractions and that 40 of the time a data warehouse failsis because of a technical issue There are many traps and pitfalls in every data warehouse venture One winter day a hunter met a bear in the forest The bear said Im hungry I want a full stomach The man repliedWell Im cold I would like a fur coat Lets compromise said the bear and he quickly gobbled up thehunter They both got what they asked for The bear went away with a full belly and the man left wrapped in afur coat With that in mind good judgment comes from experience experience comes from bad judgment Youhave shown good judgment by reading this book so let our experience keep your company from having a baddata warehouse experience

Author Daniel Borsten wrote in The Discoverist The greatest obstacle to discovering the shape of the earththe continents and the oceans was not ignorance but rather the illusion of knowledge There is a lot of illusion of knowledge being spread around in the data-warehousing environment Before you decide on anydata warehouse product ask yourself and the vendor these questions

bull As my data demands increase will the system be able to physically load the data Our experience showsthat many systems are not capable of handling very large volumes of data Do the math

bull As the data grows in volume can the system meet the performance requirements Do the mathbull As the number of users grows will the system be able to scale Do the math

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1358

bull As my environment changes will the system be flexible enough to allow changes quickly and easilyDo the math

bull Will the system need so many Database Administrators (DBAs) that my systems cost skyrockets Dothe math

bull If we suddenly merged with another company and needed to incorporate into their mainframe or LANenvironment would the system be able to connect and include them Do the math

bull Can I continue to meet my batch window timeframes Do the mathbull Could I become the hero of the company one day only to have some technical glitch blamed on me

because of my poor foresight and be thrown out of the company into a giant mud puddle Do the bath

Rule 9 - Take a Building Block Approach

Be not afraid of growing slowly be afraid only of standing still

Chinese Proverb

Ever since Vasco de Balboa discovered the Pacific coast of Panama in 1513 kings and businessmen alikedreamed of the impossible to cut a waterway across the mountainous isthmus creating a shortcut between theAtlantic and Pacific Oceans Those dreams turned into reality during the Industrial Revolution It took almost

forty years of trial and error before the worlds greatest engineering feat since the Pyramids was completed in1914 Ships move through the locks of the canal rising 85 above sea level before they descend to the oppositeside Since its grand opening in 1920 the Panama Canal has revolutionized trans-oceanic traffic joining Eastand West Its 50-mile stretch saves every vessel about 8000 extra nautical miles of travel around the bottom tipof South America Several modifications have been engineered through the years to accommodate theincreasing size of ships

Data warehouses like the Panama Canal must be built over time and changed over time to meet new demandsA data warehouse must grow with the environment but the environment is unpredictable All sailors know thatthey cant direct the wind but that they can adjust their sails In comparison all data warehouse users know theycant direct the environment but they can adjust their warehouse Sometimes the data warehouse will grow

quickly and sometimes it will grow slowly but it should always be growing

So take a building block approach to data warehousing Teradata allows you to expand without boundaries -one building block at a time Plus adding on building blocks is easy

There are two aspects to a building block approach First you need to add applications to your data warehousein three to six month intervals Once the first application works then you are ready for more projects As you become more experienced with this approach you can add multiple projects in parallel by involving multipleorganizations

The second aspect of the building block approach is in the actual data warehouse architecture It doesnt matter

if yours is the smallest data warehouse in the world the largest or falls somewhere in between power andscalability always fuel success

Not long ago a customer flew out to San Diego for a Teradata demonstration and benchmark The benchmark ran late into the evening but the numbers were more than 50 better than the competition The customer wasextremely impressed but before buying he demanded to see the system scalability that everyone had beentalking about Although it was already late a Teradata employee was called in the middle of the night arrivedwithin 10 minutes (in pajamas) hooked up the building blocks and ran a utility called config She ran anothercalled reconfig and in less than two hours the system size doubled

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1458

As the environment changes in terms of users data complexity capacity batch windows time changes eventsor opportunities users should be able to continue building applications and architecture The more a Teradatasystem grows the more Teradata outshines the competition

Rule 10 - Buy a Teradata Data Warehouse

Men occasionally stumble over the truth but most of them pick themselves up and hurry off as if nothing hadhappened

Winston Churchill

Winston Churchill led Britain through World War II during what he called that countrys finest hour Whenusers see consistent data the system too is in its finest hour Teradata gives users the ability to ask questionsthey could never ask before Users trust Teradata because of its industry performance and reputation and because it never gives in Constant use gives users optimal business experience and no matter what a user asks the system responds with a hearty Yes Sir

When we explained Teradata to Churchill he said

A data WARe-house that consists of 250 Data marts is like poison and if I were the MIS departmentresponsible for maintaining them Id take it

Teradata guarantees an Enterprise Data Warehouse with no scalability issues Data loads like lightning andsystem administration is a breeze You can pick the performance level that meets your requirements for todayand forever The database can be normalized around detail data and because of Teradatas power users have theflexibility to ask any question at any time on any data

All other databases are suspect in data loading capabilities scalability reference sites decades of datawarehouse experience flexibility system administration difficulties and inability to handle the complex queriesof todays users These users are good

TeradatamdashThe Shining Star

Overview

Teradata has always been at the top of the data warehouse game even if the experts werent bright enough toknow it The incredible vision that the original designers had was tremendous It was so far to the left of geniusthat most thought the idea was impossible

Only he who attempts the ridiculous may achieve the impossible

Don Quixote

The Teradata database was originally designed in 1976 and many of the fundamental concepts still remaintoday Nearly 25 years later Teradata is still considered ahead of its time

In 1976 IBM mainframes dominated the computer business Everyone who was anyone had an IBMMainframe However the original founders of Teradata noticed that it took about 4 frac12 years for IBM to producea new mainframe They also noticed a little company called Intel Intel created a new PC chip every 2frac12 yearsWith mainframes moving forward every 4 frac12 years and PC chip every 2frac12 years Teradata recognized their

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 5: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 558

As I grew to know the team I asked them how long it took top management to make a decision And how longdid it take to implement that decision at thousands of stores nationwide They simply said About two hours Iwas amazed Today this team continues to have one of the single greatest data warehouses ever built They useit extensively and it grows stronger every day

While visiting with this team management decided at one point that stores across the country should placeHalloween displays and candy near the cash registers In less than two hours stores moved their Halloweencandy from the normal candy aisles to end-caps near the cash register Every store participated but one

When asked why he didnt participate the store manager said he had simply run out of time to create thedisplays plus move the Halloween candy from his normal candy aisle to the end-caps Management was tickedTelling the manager they would get back to him they then asked the DBA to query the data warehouse to seehow much this snafu had cost the company The DBA came back and reported that the store actually soldalmost the same amount of Halloween candy as forecasted Management was surprised and honestly a littledisappointed with the answer But then the DBA added somewhat sheepishly I found something else tooGo ahead replied members of the management team He said I found out they actually sold about 40 morenormal candy then we forecasted for this holiday Management got on the phone immediately and told theother thousand stores Move those goblins and Halloween candy back to the normal candy aisles

What that DBA did was to use his instinct and the data warehouse to find out exactly what was going on withthe business at that time He was armed with a system that had cross-functional analysis A central datawarehouse gives good management great confidence because they see the whole picture When users can ask any question at any time and on any data their knowledge is unlimited

Most Teradata Central Data Warehouse sites will tell you most of their Return On Investment (ROI) came fromareas they never suspected Thomas Jefferson once said We dont know one millionth of a percent aboutanything When we explained Teradata to Jefferson he did not build another Monticello but he did retract hisstatement Companies with a centralized data warehouse know about a million percent more than companiesthat have invested in stovepipe applications and 300 different data marts

Actually any company planning on competing in this millennium must think long-term and begin building acentralized data warehouse If not that company will be on the short end of the stick when competing with acompany that chose to build one That thought should sound scarier than a goblin near the cash registers onHalloween

If you think about it every major decision in business makes someone happy If you are armed with factssupported by a central data warehouse and you do your homework your business decisions will make your shareholders happy However if you are making decisions with a data mart strategy those decisions are morelikely to make your competitors happy

There are many companies that are fearful of such an undertaking They want a central data warehouse but

wonder What if it fails Which database should we choose What type of hardware do we need Should wedo an RFP Decisions decisions It would literally take me about 30 seconds to make a decision on TeradataThere would be no RFP We used to wade in swimming pools of data today we are swamped in oceans of dataTeradata is built for this type of environment This book explains the fundamentals of Teradata Anyone withany experience or knowledge about data warehouse environments will clearly see why Teradata is the bestsolution

Rule 2 - Build for the User

A learned person is not one who gives the right answers it is the one who asks the right questions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 658

Claude Levi-Strauss

The user is the heart of the data warehouse and they get better with each day of experience The user makesdecisions that affect the companys bottom line Thats why the data warehouse is built around the business userBuilding a data warehouse is simple find out what data the business users need and what type of queries theywant to ask but are not able to ask today Then find out if the data is available and if the queries can be attainedWith those answers you will exceed usersrsquo expectations

An experienced data warehouse user is usually shocked when he or she first uses Teradata Its sheer power andflexibility enables users to ask questions they have never been able to ask before On a recent consultant trip of mine a young DBA got antsy when a particular query took more than a minute or so with Teradata So I askedWell how long did that same query take with your OLTP-based data warehouse He retorted We couldnteven run this query on the old system I said So whats wrong two minutes He added You know some ofour business users are so used to how long our queries used to run that they will be sitting staring at the screenwithout realizing that Teradata has already brought back the answer With Teradata users can expand their thinking by using intuition and keen business sense without technology barriers

The building of an enterprise data warehouse begins with top management but then cascades down to arelationship between the IT department and the business user community

The IT department must realize they have a supporting role That role is to please the business user by makingdata available so the business user can easily ask questions and get answers Its also the IT departments role to build a system that allows users to ask questions on their own without IT intervention Forget about building asystem where users ask IT to run the queries for them When users need information the IT department shouldeventually be able to say Ask the question yourselfhellipit is all available to you

The business users are actually the stars however the entire business community must take responsibility for the warehouses success These users must continually educate themselves and other users on the capabilities of the data warehouse new tools and new techniques that will enhance its potential Those same users must helpIT help them If both understand their respective roles and work together to help the company then the data

warehouse will be a huge success

Rule 3 - Let the IT Department Lead the Way to User Utopia

Few sports challenges are as grueling or demanding as the Tour de France But victory at this event eludedLance Armstrong a powerful young cyclist from Austin Texas Lance excelled in individual competition evenwinning the World Championships But despite his hard work Lance could not overcome the Europeansrsquo strongand proud tradition at the Tour de France A few years ago Lance was thrown into the battle of his life notagainst others but against himself He discovered that he had cancer and was given virtually no chance of surviving Suddenly he found out how little cycling really meant in life With all his might Lance battled hisway back to health beating the odds Now he found out how very much cycling could mean in life His bicycle

became a tool to reclaim the future He found a spot as a team member for the US Postal Service team With anew perspective and a new depth of character Lance led that team to victory in the next Tour de France And herepeated this victory again for the next two years

To win the premier event in the cycling world Lance Armstrong had to totally rethink his role In the sameway the key members of any company seeking success with its data warehouse must rethink their roles The ITdepartment plays a key role in a data warehouse What do users know about technical issues Not enough to build a data warehouse So technical issues are the responsibility of the IT department The danger with thistrain of thought is that while the IT department has years of experience with handling company transactionsthrough production databases and applications most are new at data warehousing A data-warehousing

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 758

environment can be extremely different than anything an IT department has ever built or used before Thereforeits a bad idea to build a data warehouse without the help of experienced people

An OLTP environment gets more and more predictable each month It is designed to be tweaked and tuned inorder to maximize a companys environment On the other hand a data warehouse is an unpredictableenvironment where the only way to gain control is to actually give up control In data warehousing the user must be allowed the freedom to ask the questions and they will blossom in an environment where flexibility isaccepted and welcomed

The only sure weapon against bad ideas is better ideas

A Whitney Griswold

If the IT department decides to build hundreds of data marts that will please each and every department thenthey are missing the boat Data warehouse experience is a hard teacher because it gives the test first and thelesson afterwards Abraham Lincoln once said A house divided cannot stand With that in mind build thedata warehouse so it will stand strong for a long time

Whats the formula First and foremost start by building your data warehouse around detail data Bring

transaction data along with key details from the OLTP systems into the data warehouse Then as knownqueries are identified build data marts to enhance their performance and also insist that data marts are createdand maintained directly from the detail data Doing so will build a foundation that will stand

Next the IT department needs to keep an open mind about creating an environment called User Utopia Haveyou ever been there In User Utopia the user confidently asks queries without fear of being charged by theminute The user has meta-data so he or she becomes intimate with the data then makes informed decisionsThe user should also be able to ask monster queries with the full backing of IT Recently on one such query theIT department wanted to pull the plug But the DBA held out granting the user more time When the queryfinished running the information it brought back from the detail data saved the company millions of dollarsOverall a user will get the majority of his or her answers back quickly from data marts but he or she also needs

the capability of going back to the detail data for more information This is User Utopia

Here is the message for IT Dont follow the idea that if you build it they will come Instead become a leader

hellip go to the users and build it together

Rule 4 - Build the Foundation Around Detail Data

Business is always trying to predict the unpredictable The US Air Force Reserves 53rd Weather Reconnaissance Squadron is a special force that flies their planes directly into tropical storms and hurricanesUsing a WC-130 Hercules aircraft they fly into storms at low altitudes between 1000 and 10000 feet takingweather readings that are relayed to the National Hurricane Center in Florida They measure wind speeds

measure the pressure and structure of the storm and most importantly locate the eye of the storm The datacollected by these Hurricane Hunters is used to determine when and where a storm might hit the coast andhow strong it will be at that time Teradata has no fear of detail data its virtual processors will fly right intothick of your data warehouse to bring back valuable information for decision support You see Teradata enablesyou to understand the storms in your business today while helping you predict when and where the next stormwill hit tomorrow

I estimate that 80 of todays data warehouses are built on summary (summarized) data Therefore 80 of all data warehouses will never come close to realizing their full potential Your data warehouse does not have to be one of them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 858

A bird does not sing because it has the answers it sings because it has a song

A data warehouse built on detail data does not sing because it has a song it sings because it has the answersWhen you capture detail data answers to an infinite amount of questions are available But if this is truly thecase then why doesnt everybody build around detail data Well there are two reasons One is price Like a bird many companies decide to go cheap cheap But watch out The real expense is not the cost of the datawarehouse it is the money that you will not make without one The second reason is power Many companiesdont have the wingspan to fly through the detail so they sore with the summary In addition some companies

dont want to pay for the disk space it actually takes to keep detail data but believe me that cost is a small priceto pay for success

Once you miss the first buttonhole it becomes difficult to button your shirt

Many companies use the same database for their data warehouse as they have done for their OLTP system Thisis a critical mistake In essence they have missed the first buttonhole and most likely will lose their shirt ontheir data warehouse adventure

At this point companies no longer have a choice of using detail data They must summarize for performancereasons As one marine told his boot camp soldiers jokingly The beatings will continue until the moral

improves Similarly a database designed for OLTP takes a continual beating when it tries to query largeamounts of detail data

Companies building true data warehouses dont compromise on price and will have a data warehouse that is built for decision support not one that specializes in OLTP With this decision you have buttoned the first buttonhole and are well on your way to reaching the top

Detail data is the foundation that data warehouses are built upon Users can ask any question anytime andconduct data mining OLAP ROLAP SQL and SPL functions build data marts directly from the detail dataand can easily maintain and grow the environment on a daily basis Now thats a tune well worth singing Makea note of it

Rule 5 - Build Data Marts from the Detail

You cannot teach a man anything you can only help him find it within himself

Galileo

Galileo was a smart man How did he know so much about life and data marts When we explained to Galileodata marts he said You cannot build a data mart directly from the OLTP systems you can only build a datamart directly from the detail within He was right

Many companies build data mart after data mart directly from the OLTP systems and their universe begins torevolve around continual maintenance Then as things get worse as Galileo predicted their universe begins torevolve around the son The son of a gun sent in to replace them

Why does this happen At first things work out great but soon there are more and more requests for additionalinformation As a result more and more data marts are created and soon the system looks like a giant spider web Different data marts start to yield different results on like data and the actual maintenance of thiscomplicated spider web takes up most of ITs time Meanwhile short-term dreams turn into long-termnightmares like this one A man and his wife had had a big argument just before he went on a business tripFeeling rather contrite about his harsh words he arranged to send his wife some flowers and asked the florist to

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 958

write on the card Im sorry I love you The beautiful bouquet arrived at the door But then his wife read thewords the florist had actually written in haste Im sorry I love you

The top reasons to build data marts directly from detail data are

bull Users can get answers from the data mart but must validate their findings or check out additionalinformation from the detail that built it

bull There is only one consistent version of the truthbull Maintenance is easy

If a user comes up with a data mart answer that does not make sense then he or she has the ability to drill downinto the detail and investigate Sometimes summary data can spark interest and finding out the why can resultin big bucks

If users dont trust the data they wont use the systemWhen a data warehouse is built on a foundation of detaildata and then data marts are erected from that foundation you have a winning combination The results willalways be consistent and trustworthy However you should only build data marts when there is a credible business case and you should be ready to drop them when they are no longer needed The life span of a datamart is relatively short to that of its mother and father (better known as the detail data) If you build the data

mart from the detail it makes them easy to manage easy to drop and easy to change

Rule 6 - Make Scalability Your Best Friend

Plan your life for a million tomorrows and live your life as if tomorrow may be your last

Morgan Jones

The roar of class-6 rapids on a river in Suriname can be almost deafening against the dense walls of the jungleEspecially when you are 9 years old Our mission was to lower our canoe down the waterfall with ropes TheTrio Amer-Indian who anchored our 40-foot dugout canoe let go of the anchor rope too quickly Without

warning the heavy boat began a freefall through the rocky water with my father hanging onto the side for dear life He disappeared under the rocky waters and I knew for sure we had lost him My heart pounded in againstmy chest As I rallied myself to grasp this loss as only a nine year old can the Indians abruptly began cheeringwildly above the roar of the river My dad had resurfaced a hundred yards downstream battered and bruised but he was alive In just one short minute I determined that I would love my family every day as if there wereno tomorrow

As I made my family my best friend a data warehouse must make scalability its best friend A data warehousethat does not scale will have no tomorrow It is only a matter of time until the warehouse disappears in rockywaters only to never come up for air Dont let go of the anchor rope

The data-warehousing environment will throw obstacles in your way every single day A data warehouse must be planned to meet todays needs But it must also be capable of meeting tomorrows challenges The futurecannot be predicted so plan for unlimited growth or linear scalability - - both vertical and horizontal There areso many data warehouses that start out with sizzling performance but as they grow they eventually andinevitably hit the scalability wall However before they hit the wall there is a pattern of diminishing performance

A data warehouse designed without scalability in mind is doomed before it is begun It can never reach its potential Take the scalability question out of the equation by investing in a database that allows you to startsmall but grows linearly

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1058

In todays fast paced world Gigabytes soon become Terabytes It may not sound like much but it weighs a tonon the shoulders of giants Listen to these measurements and pick your data warehouses life span For exampleif you lived for a million seconds (Megabyte) then you would live for 115 days In comparison if you lived fora billion seconds (Gigabyte) then you would live for 315 years Plus if you lived for a trillion seconds(Terabyte) then you would live for 31688 years

How nice it would be on your 31688th

birthday that people would say You sure look good for your age

Data warehouses hit the wall of scalability because they cannot grow with the same degree that the amount of data being gathered grows Teradata allows for unlimited linear scalability Linear Scalability is a building block approach to data warehousing that ensures that as building blocks are added the system continues at the

same performance level

This is why the largest data warehouses in the world use Teradata I was lucky to be in the right place at theright time and taught beginning stages at what are considered the two largest data warehouse sites in the worldSouth Western Bell (SBC) and Wal-Mart

Wal-Marts data warehouse started with less than 30 gigabytes and SBC started with less than 200 gigabytesand 100 users Both warehouses

bull Started small and simplebull Used Teradata from the beginningbull Have built the largest Enterprise Data Warehouse in their respective industriesbull Continue to realize additional Return On Investment (ROI) on an annual basisbull Have grown to more than 10 Terabytes of data and are still growingbull Have thousands of users (some estimates are shocking)bull Have educated and experienced data warehouse staffsbull Have educated and experienced data warehouse usersbull Experience continual growth without boundariesbull Have experienced linear performance by Teradata in every single upgrade (from gigabytes to terabytes

and from terabytes to tens of terabytes)bull Both companies are impressed with Teradatas power and performancebull And both SBC and Wal-Mart are committed to the excellence of Teradata

A data warehouse is built in small building blocks Linear Scalability is described in three ways

First building blocks are added until the performance requirements of your environment are met (GuaranteedSuccess)

Second every time the data doubles building blocks are doubled and the system maintains its performancelevel (Guaranteed Success) and

Third any time the environment changes building blocks are added until performance requirements are met(Guaranteed Success)

Scalability is not just about growing the data volume It also means growing or increasing the number of usersMany systems work flawlessly until as few as 5 users are added then they slow down to a crawl Companiesneed a system where growth and performance are easily calculated and implemented That means where thenumber of users size and complexity of queries volume of data and number of applications being used can becalculated and compared to the current systems actual size If more power speed or size is needed then thecompany can simply add building blocks to the system until the requirements are met

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1158

Rule 7 - Model the Data Correctly

You will find only what you bring in

Yoda Jedi Master in Star Wars

We model a database for the same reasons that Boeing builds an aircraft model to test flight characteristics in awind tunnel Its simpler and cheaper to model than to reconstruct the plane by iterations until you get it right

A proper data model should be designed to reflect the business components and possible relationships

Here are three rules for modeling data in a data warehouse

1 Model the data quickly2 Normalize the detail data3 Use a dimensional model for data marts

The 3rd Normal Form believes each column in a table should be directly related to the primary key the wholekey and nothing but the key Data is placed into tables where it makes the most sense and has no repeatinggroups derived data or optional columns This allows users to ask any question at any time on all data within

the enterprise Users do not have to strive for 3rd Normal Form but just normalize the data the best they canThere will be fewer columns in a table but a lot more tables overall This model is easier to maintain incrediblyflexible and allows a user to ask any question on any data at any time

A Star-Schema model is comprised of a fact table and a number of dimension tables The fact table is a tablewith a multi-part key Each element of the key is itself a foreign key to a single dimension table Theremaining fields in the fact table are known as facts and are numeric continuously valued and additive Factscan be thought of as measurements taken at the intersection of all of the dimensions Dimension attributes aremostly textual and are almost always the source of constraints and report breaks This model enhances performance on known queries or in other words queries users run repeatedly day after day

Most database modelers prefer to create a logical model in 3rd Normal Form but most database engines areovercome by physical limitations so they must compromise the model The four most difficult functions for adatabase to handle are

bull Join tablesbull Aggregate databull Sort databull Scan large volumes of data

In order to get around these system limitations vendors will suggest a model to avoid joins use summarizeddata to avoid aggregation store data in sorted order to avoid sorts and overuse indexes to avoid large scans

With these limitations vendors are also going to avoid being able to compete That is like placing a ball andchain around the runners leg and saying I wish you all the best in the marathon Come on Whose side arethese vendors really on

Teradata is the only database engine I have seen that has the power and maturity to use a 3rd Normal Form physical model on databases exceeding a terabyte in size Because of the physical limitations other databaseshave had to use a Star-Schema model to enhance performance but have given up on the ability to perform ad-hoc queries and data mining

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1258

A normalized model is one that should be used for the central data warehouse It allows users to ask anyquestion at any time on information from any place within the enterprise This is the central philosophy of adata warehouse It leads to the power of ad-hoc queries and data mining whereby advanced tools discover relationships that are not easily detected but do exist naturally in the business environment

A Star-Schema model enhances performance on known queries because we build our assumptions into themodel While these assumptions may be correct for the first application they may not be correct for othersFlexibility is a big issue but data marts can be dropped and added with relative ease if each is built directly

from the detail data

Remember build the data warehouse around detail data using a normalized model Then as query patternsemerge and performance for well-known queries becomes a priority Star Schema data marts can be created by extracting summarized or departmental data from the centralized data warehouse The user will then haveaccess to both the data marts for repetitive queries and the central warehouse for other queries

Because data marts can be an administrative nightmare Teradata enables Star-Schema access withoutrequiring physical data marts By setting up a join index as the intersection of your Star-Schema model youcan create a Star-Schema structure directly from your 3 rd Normal Form data model Best of all once it iscreated the data is automatically maintained as the underlying tables are updated

Keep in mind 80 of data warehouse queries are repetitive but 80 of the Return On Investment (ROI) isactually provided by the other 20 of the queries that go against detailed data in an iterative environment Byusing a normalized model for your central data warehouse and a Star-Schema model on data marts you canenhance the possibility of realizing an 80 Return on Investment and still enhance the performance on 80 of your queries

Rule 8 - Dont Let a Technical Issue Make Your Data Warehouse a Failure Statistic

Experience is a hard teacher because she gives the test first the lesson afterwards

Scottish Proverb

Did you know that 34th

of the people in the world hate fractions and that 40 of the time a data warehouse failsis because of a technical issue There are many traps and pitfalls in every data warehouse venture One winter day a hunter met a bear in the forest The bear said Im hungry I want a full stomach The man repliedWell Im cold I would like a fur coat Lets compromise said the bear and he quickly gobbled up thehunter They both got what they asked for The bear went away with a full belly and the man left wrapped in afur coat With that in mind good judgment comes from experience experience comes from bad judgment Youhave shown good judgment by reading this book so let our experience keep your company from having a baddata warehouse experience

Author Daniel Borsten wrote in The Discoverist The greatest obstacle to discovering the shape of the earththe continents and the oceans was not ignorance but rather the illusion of knowledge There is a lot of illusion of knowledge being spread around in the data-warehousing environment Before you decide on anydata warehouse product ask yourself and the vendor these questions

bull As my data demands increase will the system be able to physically load the data Our experience showsthat many systems are not capable of handling very large volumes of data Do the math

bull As the data grows in volume can the system meet the performance requirements Do the mathbull As the number of users grows will the system be able to scale Do the math

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1358

bull As my environment changes will the system be flexible enough to allow changes quickly and easilyDo the math

bull Will the system need so many Database Administrators (DBAs) that my systems cost skyrockets Dothe math

bull If we suddenly merged with another company and needed to incorporate into their mainframe or LANenvironment would the system be able to connect and include them Do the math

bull Can I continue to meet my batch window timeframes Do the mathbull Could I become the hero of the company one day only to have some technical glitch blamed on me

because of my poor foresight and be thrown out of the company into a giant mud puddle Do the bath

Rule 9 - Take a Building Block Approach

Be not afraid of growing slowly be afraid only of standing still

Chinese Proverb

Ever since Vasco de Balboa discovered the Pacific coast of Panama in 1513 kings and businessmen alikedreamed of the impossible to cut a waterway across the mountainous isthmus creating a shortcut between theAtlantic and Pacific Oceans Those dreams turned into reality during the Industrial Revolution It took almost

forty years of trial and error before the worlds greatest engineering feat since the Pyramids was completed in1914 Ships move through the locks of the canal rising 85 above sea level before they descend to the oppositeside Since its grand opening in 1920 the Panama Canal has revolutionized trans-oceanic traffic joining Eastand West Its 50-mile stretch saves every vessel about 8000 extra nautical miles of travel around the bottom tipof South America Several modifications have been engineered through the years to accommodate theincreasing size of ships

Data warehouses like the Panama Canal must be built over time and changed over time to meet new demandsA data warehouse must grow with the environment but the environment is unpredictable All sailors know thatthey cant direct the wind but that they can adjust their sails In comparison all data warehouse users know theycant direct the environment but they can adjust their warehouse Sometimes the data warehouse will grow

quickly and sometimes it will grow slowly but it should always be growing

So take a building block approach to data warehousing Teradata allows you to expand without boundaries -one building block at a time Plus adding on building blocks is easy

There are two aspects to a building block approach First you need to add applications to your data warehousein three to six month intervals Once the first application works then you are ready for more projects As you become more experienced with this approach you can add multiple projects in parallel by involving multipleorganizations

The second aspect of the building block approach is in the actual data warehouse architecture It doesnt matter

if yours is the smallest data warehouse in the world the largest or falls somewhere in between power andscalability always fuel success

Not long ago a customer flew out to San Diego for a Teradata demonstration and benchmark The benchmark ran late into the evening but the numbers were more than 50 better than the competition The customer wasextremely impressed but before buying he demanded to see the system scalability that everyone had beentalking about Although it was already late a Teradata employee was called in the middle of the night arrivedwithin 10 minutes (in pajamas) hooked up the building blocks and ran a utility called config She ran anothercalled reconfig and in less than two hours the system size doubled

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1458

As the environment changes in terms of users data complexity capacity batch windows time changes eventsor opportunities users should be able to continue building applications and architecture The more a Teradatasystem grows the more Teradata outshines the competition

Rule 10 - Buy a Teradata Data Warehouse

Men occasionally stumble over the truth but most of them pick themselves up and hurry off as if nothing hadhappened

Winston Churchill

Winston Churchill led Britain through World War II during what he called that countrys finest hour Whenusers see consistent data the system too is in its finest hour Teradata gives users the ability to ask questionsthey could never ask before Users trust Teradata because of its industry performance and reputation and because it never gives in Constant use gives users optimal business experience and no matter what a user asks the system responds with a hearty Yes Sir

When we explained Teradata to Churchill he said

A data WARe-house that consists of 250 Data marts is like poison and if I were the MIS departmentresponsible for maintaining them Id take it

Teradata guarantees an Enterprise Data Warehouse with no scalability issues Data loads like lightning andsystem administration is a breeze You can pick the performance level that meets your requirements for todayand forever The database can be normalized around detail data and because of Teradatas power users have theflexibility to ask any question at any time on any data

All other databases are suspect in data loading capabilities scalability reference sites decades of datawarehouse experience flexibility system administration difficulties and inability to handle the complex queriesof todays users These users are good

TeradatamdashThe Shining Star

Overview

Teradata has always been at the top of the data warehouse game even if the experts werent bright enough toknow it The incredible vision that the original designers had was tremendous It was so far to the left of geniusthat most thought the idea was impossible

Only he who attempts the ridiculous may achieve the impossible

Don Quixote

The Teradata database was originally designed in 1976 and many of the fundamental concepts still remaintoday Nearly 25 years later Teradata is still considered ahead of its time

In 1976 IBM mainframes dominated the computer business Everyone who was anyone had an IBMMainframe However the original founders of Teradata noticed that it took about 4 frac12 years for IBM to producea new mainframe They also noticed a little company called Intel Intel created a new PC chip every 2frac12 yearsWith mainframes moving forward every 4 frac12 years and PC chip every 2frac12 years Teradata recognized their

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 6: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 658

Claude Levi-Strauss

The user is the heart of the data warehouse and they get better with each day of experience The user makesdecisions that affect the companys bottom line Thats why the data warehouse is built around the business userBuilding a data warehouse is simple find out what data the business users need and what type of queries theywant to ask but are not able to ask today Then find out if the data is available and if the queries can be attainedWith those answers you will exceed usersrsquo expectations

An experienced data warehouse user is usually shocked when he or she first uses Teradata Its sheer power andflexibility enables users to ask questions they have never been able to ask before On a recent consultant trip of mine a young DBA got antsy when a particular query took more than a minute or so with Teradata So I askedWell how long did that same query take with your OLTP-based data warehouse He retorted We couldnteven run this query on the old system I said So whats wrong two minutes He added You know some ofour business users are so used to how long our queries used to run that they will be sitting staring at the screenwithout realizing that Teradata has already brought back the answer With Teradata users can expand their thinking by using intuition and keen business sense without technology barriers

The building of an enterprise data warehouse begins with top management but then cascades down to arelationship between the IT department and the business user community

The IT department must realize they have a supporting role That role is to please the business user by makingdata available so the business user can easily ask questions and get answers Its also the IT departments role to build a system that allows users to ask questions on their own without IT intervention Forget about building asystem where users ask IT to run the queries for them When users need information the IT department shouldeventually be able to say Ask the question yourselfhellipit is all available to you

The business users are actually the stars however the entire business community must take responsibility for the warehouses success These users must continually educate themselves and other users on the capabilities of the data warehouse new tools and new techniques that will enhance its potential Those same users must helpIT help them If both understand their respective roles and work together to help the company then the data

warehouse will be a huge success

Rule 3 - Let the IT Department Lead the Way to User Utopia

Few sports challenges are as grueling or demanding as the Tour de France But victory at this event eludedLance Armstrong a powerful young cyclist from Austin Texas Lance excelled in individual competition evenwinning the World Championships But despite his hard work Lance could not overcome the Europeansrsquo strongand proud tradition at the Tour de France A few years ago Lance was thrown into the battle of his life notagainst others but against himself He discovered that he had cancer and was given virtually no chance of surviving Suddenly he found out how little cycling really meant in life With all his might Lance battled hisway back to health beating the odds Now he found out how very much cycling could mean in life His bicycle

became a tool to reclaim the future He found a spot as a team member for the US Postal Service team With anew perspective and a new depth of character Lance led that team to victory in the next Tour de France And herepeated this victory again for the next two years

To win the premier event in the cycling world Lance Armstrong had to totally rethink his role In the sameway the key members of any company seeking success with its data warehouse must rethink their roles The ITdepartment plays a key role in a data warehouse What do users know about technical issues Not enough to build a data warehouse So technical issues are the responsibility of the IT department The danger with thistrain of thought is that while the IT department has years of experience with handling company transactionsthrough production databases and applications most are new at data warehousing A data-warehousing

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 758

environment can be extremely different than anything an IT department has ever built or used before Thereforeits a bad idea to build a data warehouse without the help of experienced people

An OLTP environment gets more and more predictable each month It is designed to be tweaked and tuned inorder to maximize a companys environment On the other hand a data warehouse is an unpredictableenvironment where the only way to gain control is to actually give up control In data warehousing the user must be allowed the freedom to ask the questions and they will blossom in an environment where flexibility isaccepted and welcomed

The only sure weapon against bad ideas is better ideas

A Whitney Griswold

If the IT department decides to build hundreds of data marts that will please each and every department thenthey are missing the boat Data warehouse experience is a hard teacher because it gives the test first and thelesson afterwards Abraham Lincoln once said A house divided cannot stand With that in mind build thedata warehouse so it will stand strong for a long time

Whats the formula First and foremost start by building your data warehouse around detail data Bring

transaction data along with key details from the OLTP systems into the data warehouse Then as knownqueries are identified build data marts to enhance their performance and also insist that data marts are createdand maintained directly from the detail data Doing so will build a foundation that will stand

Next the IT department needs to keep an open mind about creating an environment called User Utopia Haveyou ever been there In User Utopia the user confidently asks queries without fear of being charged by theminute The user has meta-data so he or she becomes intimate with the data then makes informed decisionsThe user should also be able to ask monster queries with the full backing of IT Recently on one such query theIT department wanted to pull the plug But the DBA held out granting the user more time When the queryfinished running the information it brought back from the detail data saved the company millions of dollarsOverall a user will get the majority of his or her answers back quickly from data marts but he or she also needs

the capability of going back to the detail data for more information This is User Utopia

Here is the message for IT Dont follow the idea that if you build it they will come Instead become a leader

hellip go to the users and build it together

Rule 4 - Build the Foundation Around Detail Data

Business is always trying to predict the unpredictable The US Air Force Reserves 53rd Weather Reconnaissance Squadron is a special force that flies their planes directly into tropical storms and hurricanesUsing a WC-130 Hercules aircraft they fly into storms at low altitudes between 1000 and 10000 feet takingweather readings that are relayed to the National Hurricane Center in Florida They measure wind speeds

measure the pressure and structure of the storm and most importantly locate the eye of the storm The datacollected by these Hurricane Hunters is used to determine when and where a storm might hit the coast andhow strong it will be at that time Teradata has no fear of detail data its virtual processors will fly right intothick of your data warehouse to bring back valuable information for decision support You see Teradata enablesyou to understand the storms in your business today while helping you predict when and where the next stormwill hit tomorrow

I estimate that 80 of todays data warehouses are built on summary (summarized) data Therefore 80 of all data warehouses will never come close to realizing their full potential Your data warehouse does not have to be one of them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 858

A bird does not sing because it has the answers it sings because it has a song

A data warehouse built on detail data does not sing because it has a song it sings because it has the answersWhen you capture detail data answers to an infinite amount of questions are available But if this is truly thecase then why doesnt everybody build around detail data Well there are two reasons One is price Like a bird many companies decide to go cheap cheap But watch out The real expense is not the cost of the datawarehouse it is the money that you will not make without one The second reason is power Many companiesdont have the wingspan to fly through the detail so they sore with the summary In addition some companies

dont want to pay for the disk space it actually takes to keep detail data but believe me that cost is a small priceto pay for success

Once you miss the first buttonhole it becomes difficult to button your shirt

Many companies use the same database for their data warehouse as they have done for their OLTP system Thisis a critical mistake In essence they have missed the first buttonhole and most likely will lose their shirt ontheir data warehouse adventure

At this point companies no longer have a choice of using detail data They must summarize for performancereasons As one marine told his boot camp soldiers jokingly The beatings will continue until the moral

improves Similarly a database designed for OLTP takes a continual beating when it tries to query largeamounts of detail data

Companies building true data warehouses dont compromise on price and will have a data warehouse that is built for decision support not one that specializes in OLTP With this decision you have buttoned the first buttonhole and are well on your way to reaching the top

Detail data is the foundation that data warehouses are built upon Users can ask any question anytime andconduct data mining OLAP ROLAP SQL and SPL functions build data marts directly from the detail dataand can easily maintain and grow the environment on a daily basis Now thats a tune well worth singing Makea note of it

Rule 5 - Build Data Marts from the Detail

You cannot teach a man anything you can only help him find it within himself

Galileo

Galileo was a smart man How did he know so much about life and data marts When we explained to Galileodata marts he said You cannot build a data mart directly from the OLTP systems you can only build a datamart directly from the detail within He was right

Many companies build data mart after data mart directly from the OLTP systems and their universe begins torevolve around continual maintenance Then as things get worse as Galileo predicted their universe begins torevolve around the son The son of a gun sent in to replace them

Why does this happen At first things work out great but soon there are more and more requests for additionalinformation As a result more and more data marts are created and soon the system looks like a giant spider web Different data marts start to yield different results on like data and the actual maintenance of thiscomplicated spider web takes up most of ITs time Meanwhile short-term dreams turn into long-termnightmares like this one A man and his wife had had a big argument just before he went on a business tripFeeling rather contrite about his harsh words he arranged to send his wife some flowers and asked the florist to

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 958

write on the card Im sorry I love you The beautiful bouquet arrived at the door But then his wife read thewords the florist had actually written in haste Im sorry I love you

The top reasons to build data marts directly from detail data are

bull Users can get answers from the data mart but must validate their findings or check out additionalinformation from the detail that built it

bull There is only one consistent version of the truthbull Maintenance is easy

If a user comes up with a data mart answer that does not make sense then he or she has the ability to drill downinto the detail and investigate Sometimes summary data can spark interest and finding out the why can resultin big bucks

If users dont trust the data they wont use the systemWhen a data warehouse is built on a foundation of detaildata and then data marts are erected from that foundation you have a winning combination The results willalways be consistent and trustworthy However you should only build data marts when there is a credible business case and you should be ready to drop them when they are no longer needed The life span of a datamart is relatively short to that of its mother and father (better known as the detail data) If you build the data

mart from the detail it makes them easy to manage easy to drop and easy to change

Rule 6 - Make Scalability Your Best Friend

Plan your life for a million tomorrows and live your life as if tomorrow may be your last

Morgan Jones

The roar of class-6 rapids on a river in Suriname can be almost deafening against the dense walls of the jungleEspecially when you are 9 years old Our mission was to lower our canoe down the waterfall with ropes TheTrio Amer-Indian who anchored our 40-foot dugout canoe let go of the anchor rope too quickly Without

warning the heavy boat began a freefall through the rocky water with my father hanging onto the side for dear life He disappeared under the rocky waters and I knew for sure we had lost him My heart pounded in againstmy chest As I rallied myself to grasp this loss as only a nine year old can the Indians abruptly began cheeringwildly above the roar of the river My dad had resurfaced a hundred yards downstream battered and bruised but he was alive In just one short minute I determined that I would love my family every day as if there wereno tomorrow

As I made my family my best friend a data warehouse must make scalability its best friend A data warehousethat does not scale will have no tomorrow It is only a matter of time until the warehouse disappears in rockywaters only to never come up for air Dont let go of the anchor rope

The data-warehousing environment will throw obstacles in your way every single day A data warehouse must be planned to meet todays needs But it must also be capable of meeting tomorrows challenges The futurecannot be predicted so plan for unlimited growth or linear scalability - - both vertical and horizontal There areso many data warehouses that start out with sizzling performance but as they grow they eventually andinevitably hit the scalability wall However before they hit the wall there is a pattern of diminishing performance

A data warehouse designed without scalability in mind is doomed before it is begun It can never reach its potential Take the scalability question out of the equation by investing in a database that allows you to startsmall but grows linearly

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1058

In todays fast paced world Gigabytes soon become Terabytes It may not sound like much but it weighs a tonon the shoulders of giants Listen to these measurements and pick your data warehouses life span For exampleif you lived for a million seconds (Megabyte) then you would live for 115 days In comparison if you lived fora billion seconds (Gigabyte) then you would live for 315 years Plus if you lived for a trillion seconds(Terabyte) then you would live for 31688 years

How nice it would be on your 31688th

birthday that people would say You sure look good for your age

Data warehouses hit the wall of scalability because they cannot grow with the same degree that the amount of data being gathered grows Teradata allows for unlimited linear scalability Linear Scalability is a building block approach to data warehousing that ensures that as building blocks are added the system continues at the

same performance level

This is why the largest data warehouses in the world use Teradata I was lucky to be in the right place at theright time and taught beginning stages at what are considered the two largest data warehouse sites in the worldSouth Western Bell (SBC) and Wal-Mart

Wal-Marts data warehouse started with less than 30 gigabytes and SBC started with less than 200 gigabytesand 100 users Both warehouses

bull Started small and simplebull Used Teradata from the beginningbull Have built the largest Enterprise Data Warehouse in their respective industriesbull Continue to realize additional Return On Investment (ROI) on an annual basisbull Have grown to more than 10 Terabytes of data and are still growingbull Have thousands of users (some estimates are shocking)bull Have educated and experienced data warehouse staffsbull Have educated and experienced data warehouse usersbull Experience continual growth without boundariesbull Have experienced linear performance by Teradata in every single upgrade (from gigabytes to terabytes

and from terabytes to tens of terabytes)bull Both companies are impressed with Teradatas power and performancebull And both SBC and Wal-Mart are committed to the excellence of Teradata

A data warehouse is built in small building blocks Linear Scalability is described in three ways

First building blocks are added until the performance requirements of your environment are met (GuaranteedSuccess)

Second every time the data doubles building blocks are doubled and the system maintains its performancelevel (Guaranteed Success) and

Third any time the environment changes building blocks are added until performance requirements are met(Guaranteed Success)

Scalability is not just about growing the data volume It also means growing or increasing the number of usersMany systems work flawlessly until as few as 5 users are added then they slow down to a crawl Companiesneed a system where growth and performance are easily calculated and implemented That means where thenumber of users size and complexity of queries volume of data and number of applications being used can becalculated and compared to the current systems actual size If more power speed or size is needed then thecompany can simply add building blocks to the system until the requirements are met

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1158

Rule 7 - Model the Data Correctly

You will find only what you bring in

Yoda Jedi Master in Star Wars

We model a database for the same reasons that Boeing builds an aircraft model to test flight characteristics in awind tunnel Its simpler and cheaper to model than to reconstruct the plane by iterations until you get it right

A proper data model should be designed to reflect the business components and possible relationships

Here are three rules for modeling data in a data warehouse

1 Model the data quickly2 Normalize the detail data3 Use a dimensional model for data marts

The 3rd Normal Form believes each column in a table should be directly related to the primary key the wholekey and nothing but the key Data is placed into tables where it makes the most sense and has no repeatinggroups derived data or optional columns This allows users to ask any question at any time on all data within

the enterprise Users do not have to strive for 3rd Normal Form but just normalize the data the best they canThere will be fewer columns in a table but a lot more tables overall This model is easier to maintain incrediblyflexible and allows a user to ask any question on any data at any time

A Star-Schema model is comprised of a fact table and a number of dimension tables The fact table is a tablewith a multi-part key Each element of the key is itself a foreign key to a single dimension table Theremaining fields in the fact table are known as facts and are numeric continuously valued and additive Factscan be thought of as measurements taken at the intersection of all of the dimensions Dimension attributes aremostly textual and are almost always the source of constraints and report breaks This model enhances performance on known queries or in other words queries users run repeatedly day after day

Most database modelers prefer to create a logical model in 3rd Normal Form but most database engines areovercome by physical limitations so they must compromise the model The four most difficult functions for adatabase to handle are

bull Join tablesbull Aggregate databull Sort databull Scan large volumes of data

In order to get around these system limitations vendors will suggest a model to avoid joins use summarizeddata to avoid aggregation store data in sorted order to avoid sorts and overuse indexes to avoid large scans

With these limitations vendors are also going to avoid being able to compete That is like placing a ball andchain around the runners leg and saying I wish you all the best in the marathon Come on Whose side arethese vendors really on

Teradata is the only database engine I have seen that has the power and maturity to use a 3rd Normal Form physical model on databases exceeding a terabyte in size Because of the physical limitations other databaseshave had to use a Star-Schema model to enhance performance but have given up on the ability to perform ad-hoc queries and data mining

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1258

A normalized model is one that should be used for the central data warehouse It allows users to ask anyquestion at any time on information from any place within the enterprise This is the central philosophy of adata warehouse It leads to the power of ad-hoc queries and data mining whereby advanced tools discover relationships that are not easily detected but do exist naturally in the business environment

A Star-Schema model enhances performance on known queries because we build our assumptions into themodel While these assumptions may be correct for the first application they may not be correct for othersFlexibility is a big issue but data marts can be dropped and added with relative ease if each is built directly

from the detail data

Remember build the data warehouse around detail data using a normalized model Then as query patternsemerge and performance for well-known queries becomes a priority Star Schema data marts can be created by extracting summarized or departmental data from the centralized data warehouse The user will then haveaccess to both the data marts for repetitive queries and the central warehouse for other queries

Because data marts can be an administrative nightmare Teradata enables Star-Schema access withoutrequiring physical data marts By setting up a join index as the intersection of your Star-Schema model youcan create a Star-Schema structure directly from your 3 rd Normal Form data model Best of all once it iscreated the data is automatically maintained as the underlying tables are updated

Keep in mind 80 of data warehouse queries are repetitive but 80 of the Return On Investment (ROI) isactually provided by the other 20 of the queries that go against detailed data in an iterative environment Byusing a normalized model for your central data warehouse and a Star-Schema model on data marts you canenhance the possibility of realizing an 80 Return on Investment and still enhance the performance on 80 of your queries

Rule 8 - Dont Let a Technical Issue Make Your Data Warehouse a Failure Statistic

Experience is a hard teacher because she gives the test first the lesson afterwards

Scottish Proverb

Did you know that 34th

of the people in the world hate fractions and that 40 of the time a data warehouse failsis because of a technical issue There are many traps and pitfalls in every data warehouse venture One winter day a hunter met a bear in the forest The bear said Im hungry I want a full stomach The man repliedWell Im cold I would like a fur coat Lets compromise said the bear and he quickly gobbled up thehunter They both got what they asked for The bear went away with a full belly and the man left wrapped in afur coat With that in mind good judgment comes from experience experience comes from bad judgment Youhave shown good judgment by reading this book so let our experience keep your company from having a baddata warehouse experience

Author Daniel Borsten wrote in The Discoverist The greatest obstacle to discovering the shape of the earththe continents and the oceans was not ignorance but rather the illusion of knowledge There is a lot of illusion of knowledge being spread around in the data-warehousing environment Before you decide on anydata warehouse product ask yourself and the vendor these questions

bull As my data demands increase will the system be able to physically load the data Our experience showsthat many systems are not capable of handling very large volumes of data Do the math

bull As the data grows in volume can the system meet the performance requirements Do the mathbull As the number of users grows will the system be able to scale Do the math

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1358

bull As my environment changes will the system be flexible enough to allow changes quickly and easilyDo the math

bull Will the system need so many Database Administrators (DBAs) that my systems cost skyrockets Dothe math

bull If we suddenly merged with another company and needed to incorporate into their mainframe or LANenvironment would the system be able to connect and include them Do the math

bull Can I continue to meet my batch window timeframes Do the mathbull Could I become the hero of the company one day only to have some technical glitch blamed on me

because of my poor foresight and be thrown out of the company into a giant mud puddle Do the bath

Rule 9 - Take a Building Block Approach

Be not afraid of growing slowly be afraid only of standing still

Chinese Proverb

Ever since Vasco de Balboa discovered the Pacific coast of Panama in 1513 kings and businessmen alikedreamed of the impossible to cut a waterway across the mountainous isthmus creating a shortcut between theAtlantic and Pacific Oceans Those dreams turned into reality during the Industrial Revolution It took almost

forty years of trial and error before the worlds greatest engineering feat since the Pyramids was completed in1914 Ships move through the locks of the canal rising 85 above sea level before they descend to the oppositeside Since its grand opening in 1920 the Panama Canal has revolutionized trans-oceanic traffic joining Eastand West Its 50-mile stretch saves every vessel about 8000 extra nautical miles of travel around the bottom tipof South America Several modifications have been engineered through the years to accommodate theincreasing size of ships

Data warehouses like the Panama Canal must be built over time and changed over time to meet new demandsA data warehouse must grow with the environment but the environment is unpredictable All sailors know thatthey cant direct the wind but that they can adjust their sails In comparison all data warehouse users know theycant direct the environment but they can adjust their warehouse Sometimes the data warehouse will grow

quickly and sometimes it will grow slowly but it should always be growing

So take a building block approach to data warehousing Teradata allows you to expand without boundaries -one building block at a time Plus adding on building blocks is easy

There are two aspects to a building block approach First you need to add applications to your data warehousein three to six month intervals Once the first application works then you are ready for more projects As you become more experienced with this approach you can add multiple projects in parallel by involving multipleorganizations

The second aspect of the building block approach is in the actual data warehouse architecture It doesnt matter

if yours is the smallest data warehouse in the world the largest or falls somewhere in between power andscalability always fuel success

Not long ago a customer flew out to San Diego for a Teradata demonstration and benchmark The benchmark ran late into the evening but the numbers were more than 50 better than the competition The customer wasextremely impressed but before buying he demanded to see the system scalability that everyone had beentalking about Although it was already late a Teradata employee was called in the middle of the night arrivedwithin 10 minutes (in pajamas) hooked up the building blocks and ran a utility called config She ran anothercalled reconfig and in less than two hours the system size doubled

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1458

As the environment changes in terms of users data complexity capacity batch windows time changes eventsor opportunities users should be able to continue building applications and architecture The more a Teradatasystem grows the more Teradata outshines the competition

Rule 10 - Buy a Teradata Data Warehouse

Men occasionally stumble over the truth but most of them pick themselves up and hurry off as if nothing hadhappened

Winston Churchill

Winston Churchill led Britain through World War II during what he called that countrys finest hour Whenusers see consistent data the system too is in its finest hour Teradata gives users the ability to ask questionsthey could never ask before Users trust Teradata because of its industry performance and reputation and because it never gives in Constant use gives users optimal business experience and no matter what a user asks the system responds with a hearty Yes Sir

When we explained Teradata to Churchill he said

A data WARe-house that consists of 250 Data marts is like poison and if I were the MIS departmentresponsible for maintaining them Id take it

Teradata guarantees an Enterprise Data Warehouse with no scalability issues Data loads like lightning andsystem administration is a breeze You can pick the performance level that meets your requirements for todayand forever The database can be normalized around detail data and because of Teradatas power users have theflexibility to ask any question at any time on any data

All other databases are suspect in data loading capabilities scalability reference sites decades of datawarehouse experience flexibility system administration difficulties and inability to handle the complex queriesof todays users These users are good

TeradatamdashThe Shining Star

Overview

Teradata has always been at the top of the data warehouse game even if the experts werent bright enough toknow it The incredible vision that the original designers had was tremendous It was so far to the left of geniusthat most thought the idea was impossible

Only he who attempts the ridiculous may achieve the impossible

Don Quixote

The Teradata database was originally designed in 1976 and many of the fundamental concepts still remaintoday Nearly 25 years later Teradata is still considered ahead of its time

In 1976 IBM mainframes dominated the computer business Everyone who was anyone had an IBMMainframe However the original founders of Teradata noticed that it took about 4 frac12 years for IBM to producea new mainframe They also noticed a little company called Intel Intel created a new PC chip every 2frac12 yearsWith mainframes moving forward every 4 frac12 years and PC chip every 2frac12 years Teradata recognized their

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 7: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 758

environment can be extremely different than anything an IT department has ever built or used before Thereforeits a bad idea to build a data warehouse without the help of experienced people

An OLTP environment gets more and more predictable each month It is designed to be tweaked and tuned inorder to maximize a companys environment On the other hand a data warehouse is an unpredictableenvironment where the only way to gain control is to actually give up control In data warehousing the user must be allowed the freedom to ask the questions and they will blossom in an environment where flexibility isaccepted and welcomed

The only sure weapon against bad ideas is better ideas

A Whitney Griswold

If the IT department decides to build hundreds of data marts that will please each and every department thenthey are missing the boat Data warehouse experience is a hard teacher because it gives the test first and thelesson afterwards Abraham Lincoln once said A house divided cannot stand With that in mind build thedata warehouse so it will stand strong for a long time

Whats the formula First and foremost start by building your data warehouse around detail data Bring

transaction data along with key details from the OLTP systems into the data warehouse Then as knownqueries are identified build data marts to enhance their performance and also insist that data marts are createdand maintained directly from the detail data Doing so will build a foundation that will stand

Next the IT department needs to keep an open mind about creating an environment called User Utopia Haveyou ever been there In User Utopia the user confidently asks queries without fear of being charged by theminute The user has meta-data so he or she becomes intimate with the data then makes informed decisionsThe user should also be able to ask monster queries with the full backing of IT Recently on one such query theIT department wanted to pull the plug But the DBA held out granting the user more time When the queryfinished running the information it brought back from the detail data saved the company millions of dollarsOverall a user will get the majority of his or her answers back quickly from data marts but he or she also needs

the capability of going back to the detail data for more information This is User Utopia

Here is the message for IT Dont follow the idea that if you build it they will come Instead become a leader

hellip go to the users and build it together

Rule 4 - Build the Foundation Around Detail Data

Business is always trying to predict the unpredictable The US Air Force Reserves 53rd Weather Reconnaissance Squadron is a special force that flies their planes directly into tropical storms and hurricanesUsing a WC-130 Hercules aircraft they fly into storms at low altitudes between 1000 and 10000 feet takingweather readings that are relayed to the National Hurricane Center in Florida They measure wind speeds

measure the pressure and structure of the storm and most importantly locate the eye of the storm The datacollected by these Hurricane Hunters is used to determine when and where a storm might hit the coast andhow strong it will be at that time Teradata has no fear of detail data its virtual processors will fly right intothick of your data warehouse to bring back valuable information for decision support You see Teradata enablesyou to understand the storms in your business today while helping you predict when and where the next stormwill hit tomorrow

I estimate that 80 of todays data warehouses are built on summary (summarized) data Therefore 80 of all data warehouses will never come close to realizing their full potential Your data warehouse does not have to be one of them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 858

A bird does not sing because it has the answers it sings because it has a song

A data warehouse built on detail data does not sing because it has a song it sings because it has the answersWhen you capture detail data answers to an infinite amount of questions are available But if this is truly thecase then why doesnt everybody build around detail data Well there are two reasons One is price Like a bird many companies decide to go cheap cheap But watch out The real expense is not the cost of the datawarehouse it is the money that you will not make without one The second reason is power Many companiesdont have the wingspan to fly through the detail so they sore with the summary In addition some companies

dont want to pay for the disk space it actually takes to keep detail data but believe me that cost is a small priceto pay for success

Once you miss the first buttonhole it becomes difficult to button your shirt

Many companies use the same database for their data warehouse as they have done for their OLTP system Thisis a critical mistake In essence they have missed the first buttonhole and most likely will lose their shirt ontheir data warehouse adventure

At this point companies no longer have a choice of using detail data They must summarize for performancereasons As one marine told his boot camp soldiers jokingly The beatings will continue until the moral

improves Similarly a database designed for OLTP takes a continual beating when it tries to query largeamounts of detail data

Companies building true data warehouses dont compromise on price and will have a data warehouse that is built for decision support not one that specializes in OLTP With this decision you have buttoned the first buttonhole and are well on your way to reaching the top

Detail data is the foundation that data warehouses are built upon Users can ask any question anytime andconduct data mining OLAP ROLAP SQL and SPL functions build data marts directly from the detail dataand can easily maintain and grow the environment on a daily basis Now thats a tune well worth singing Makea note of it

Rule 5 - Build Data Marts from the Detail

You cannot teach a man anything you can only help him find it within himself

Galileo

Galileo was a smart man How did he know so much about life and data marts When we explained to Galileodata marts he said You cannot build a data mart directly from the OLTP systems you can only build a datamart directly from the detail within He was right

Many companies build data mart after data mart directly from the OLTP systems and their universe begins torevolve around continual maintenance Then as things get worse as Galileo predicted their universe begins torevolve around the son The son of a gun sent in to replace them

Why does this happen At first things work out great but soon there are more and more requests for additionalinformation As a result more and more data marts are created and soon the system looks like a giant spider web Different data marts start to yield different results on like data and the actual maintenance of thiscomplicated spider web takes up most of ITs time Meanwhile short-term dreams turn into long-termnightmares like this one A man and his wife had had a big argument just before he went on a business tripFeeling rather contrite about his harsh words he arranged to send his wife some flowers and asked the florist to

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 958

write on the card Im sorry I love you The beautiful bouquet arrived at the door But then his wife read thewords the florist had actually written in haste Im sorry I love you

The top reasons to build data marts directly from detail data are

bull Users can get answers from the data mart but must validate their findings or check out additionalinformation from the detail that built it

bull There is only one consistent version of the truthbull Maintenance is easy

If a user comes up with a data mart answer that does not make sense then he or she has the ability to drill downinto the detail and investigate Sometimes summary data can spark interest and finding out the why can resultin big bucks

If users dont trust the data they wont use the systemWhen a data warehouse is built on a foundation of detaildata and then data marts are erected from that foundation you have a winning combination The results willalways be consistent and trustworthy However you should only build data marts when there is a credible business case and you should be ready to drop them when they are no longer needed The life span of a datamart is relatively short to that of its mother and father (better known as the detail data) If you build the data

mart from the detail it makes them easy to manage easy to drop and easy to change

Rule 6 - Make Scalability Your Best Friend

Plan your life for a million tomorrows and live your life as if tomorrow may be your last

Morgan Jones

The roar of class-6 rapids on a river in Suriname can be almost deafening against the dense walls of the jungleEspecially when you are 9 years old Our mission was to lower our canoe down the waterfall with ropes TheTrio Amer-Indian who anchored our 40-foot dugout canoe let go of the anchor rope too quickly Without

warning the heavy boat began a freefall through the rocky water with my father hanging onto the side for dear life He disappeared under the rocky waters and I knew for sure we had lost him My heart pounded in againstmy chest As I rallied myself to grasp this loss as only a nine year old can the Indians abruptly began cheeringwildly above the roar of the river My dad had resurfaced a hundred yards downstream battered and bruised but he was alive In just one short minute I determined that I would love my family every day as if there wereno tomorrow

As I made my family my best friend a data warehouse must make scalability its best friend A data warehousethat does not scale will have no tomorrow It is only a matter of time until the warehouse disappears in rockywaters only to never come up for air Dont let go of the anchor rope

The data-warehousing environment will throw obstacles in your way every single day A data warehouse must be planned to meet todays needs But it must also be capable of meeting tomorrows challenges The futurecannot be predicted so plan for unlimited growth or linear scalability - - both vertical and horizontal There areso many data warehouses that start out with sizzling performance but as they grow they eventually andinevitably hit the scalability wall However before they hit the wall there is a pattern of diminishing performance

A data warehouse designed without scalability in mind is doomed before it is begun It can never reach its potential Take the scalability question out of the equation by investing in a database that allows you to startsmall but grows linearly

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1058

In todays fast paced world Gigabytes soon become Terabytes It may not sound like much but it weighs a tonon the shoulders of giants Listen to these measurements and pick your data warehouses life span For exampleif you lived for a million seconds (Megabyte) then you would live for 115 days In comparison if you lived fora billion seconds (Gigabyte) then you would live for 315 years Plus if you lived for a trillion seconds(Terabyte) then you would live for 31688 years

How nice it would be on your 31688th

birthday that people would say You sure look good for your age

Data warehouses hit the wall of scalability because they cannot grow with the same degree that the amount of data being gathered grows Teradata allows for unlimited linear scalability Linear Scalability is a building block approach to data warehousing that ensures that as building blocks are added the system continues at the

same performance level

This is why the largest data warehouses in the world use Teradata I was lucky to be in the right place at theright time and taught beginning stages at what are considered the two largest data warehouse sites in the worldSouth Western Bell (SBC) and Wal-Mart

Wal-Marts data warehouse started with less than 30 gigabytes and SBC started with less than 200 gigabytesand 100 users Both warehouses

bull Started small and simplebull Used Teradata from the beginningbull Have built the largest Enterprise Data Warehouse in their respective industriesbull Continue to realize additional Return On Investment (ROI) on an annual basisbull Have grown to more than 10 Terabytes of data and are still growingbull Have thousands of users (some estimates are shocking)bull Have educated and experienced data warehouse staffsbull Have educated and experienced data warehouse usersbull Experience continual growth without boundariesbull Have experienced linear performance by Teradata in every single upgrade (from gigabytes to terabytes

and from terabytes to tens of terabytes)bull Both companies are impressed with Teradatas power and performancebull And both SBC and Wal-Mart are committed to the excellence of Teradata

A data warehouse is built in small building blocks Linear Scalability is described in three ways

First building blocks are added until the performance requirements of your environment are met (GuaranteedSuccess)

Second every time the data doubles building blocks are doubled and the system maintains its performancelevel (Guaranteed Success) and

Third any time the environment changes building blocks are added until performance requirements are met(Guaranteed Success)

Scalability is not just about growing the data volume It also means growing or increasing the number of usersMany systems work flawlessly until as few as 5 users are added then they slow down to a crawl Companiesneed a system where growth and performance are easily calculated and implemented That means where thenumber of users size and complexity of queries volume of data and number of applications being used can becalculated and compared to the current systems actual size If more power speed or size is needed then thecompany can simply add building blocks to the system until the requirements are met

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1158

Rule 7 - Model the Data Correctly

You will find only what you bring in

Yoda Jedi Master in Star Wars

We model a database for the same reasons that Boeing builds an aircraft model to test flight characteristics in awind tunnel Its simpler and cheaper to model than to reconstruct the plane by iterations until you get it right

A proper data model should be designed to reflect the business components and possible relationships

Here are three rules for modeling data in a data warehouse

1 Model the data quickly2 Normalize the detail data3 Use a dimensional model for data marts

The 3rd Normal Form believes each column in a table should be directly related to the primary key the wholekey and nothing but the key Data is placed into tables where it makes the most sense and has no repeatinggroups derived data or optional columns This allows users to ask any question at any time on all data within

the enterprise Users do not have to strive for 3rd Normal Form but just normalize the data the best they canThere will be fewer columns in a table but a lot more tables overall This model is easier to maintain incrediblyflexible and allows a user to ask any question on any data at any time

A Star-Schema model is comprised of a fact table and a number of dimension tables The fact table is a tablewith a multi-part key Each element of the key is itself a foreign key to a single dimension table Theremaining fields in the fact table are known as facts and are numeric continuously valued and additive Factscan be thought of as measurements taken at the intersection of all of the dimensions Dimension attributes aremostly textual and are almost always the source of constraints and report breaks This model enhances performance on known queries or in other words queries users run repeatedly day after day

Most database modelers prefer to create a logical model in 3rd Normal Form but most database engines areovercome by physical limitations so they must compromise the model The four most difficult functions for adatabase to handle are

bull Join tablesbull Aggregate databull Sort databull Scan large volumes of data

In order to get around these system limitations vendors will suggest a model to avoid joins use summarizeddata to avoid aggregation store data in sorted order to avoid sorts and overuse indexes to avoid large scans

With these limitations vendors are also going to avoid being able to compete That is like placing a ball andchain around the runners leg and saying I wish you all the best in the marathon Come on Whose side arethese vendors really on

Teradata is the only database engine I have seen that has the power and maturity to use a 3rd Normal Form physical model on databases exceeding a terabyte in size Because of the physical limitations other databaseshave had to use a Star-Schema model to enhance performance but have given up on the ability to perform ad-hoc queries and data mining

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1258

A normalized model is one that should be used for the central data warehouse It allows users to ask anyquestion at any time on information from any place within the enterprise This is the central philosophy of adata warehouse It leads to the power of ad-hoc queries and data mining whereby advanced tools discover relationships that are not easily detected but do exist naturally in the business environment

A Star-Schema model enhances performance on known queries because we build our assumptions into themodel While these assumptions may be correct for the first application they may not be correct for othersFlexibility is a big issue but data marts can be dropped and added with relative ease if each is built directly

from the detail data

Remember build the data warehouse around detail data using a normalized model Then as query patternsemerge and performance for well-known queries becomes a priority Star Schema data marts can be created by extracting summarized or departmental data from the centralized data warehouse The user will then haveaccess to both the data marts for repetitive queries and the central warehouse for other queries

Because data marts can be an administrative nightmare Teradata enables Star-Schema access withoutrequiring physical data marts By setting up a join index as the intersection of your Star-Schema model youcan create a Star-Schema structure directly from your 3 rd Normal Form data model Best of all once it iscreated the data is automatically maintained as the underlying tables are updated

Keep in mind 80 of data warehouse queries are repetitive but 80 of the Return On Investment (ROI) isactually provided by the other 20 of the queries that go against detailed data in an iterative environment Byusing a normalized model for your central data warehouse and a Star-Schema model on data marts you canenhance the possibility of realizing an 80 Return on Investment and still enhance the performance on 80 of your queries

Rule 8 - Dont Let a Technical Issue Make Your Data Warehouse a Failure Statistic

Experience is a hard teacher because she gives the test first the lesson afterwards

Scottish Proverb

Did you know that 34th

of the people in the world hate fractions and that 40 of the time a data warehouse failsis because of a technical issue There are many traps and pitfalls in every data warehouse venture One winter day a hunter met a bear in the forest The bear said Im hungry I want a full stomach The man repliedWell Im cold I would like a fur coat Lets compromise said the bear and he quickly gobbled up thehunter They both got what they asked for The bear went away with a full belly and the man left wrapped in afur coat With that in mind good judgment comes from experience experience comes from bad judgment Youhave shown good judgment by reading this book so let our experience keep your company from having a baddata warehouse experience

Author Daniel Borsten wrote in The Discoverist The greatest obstacle to discovering the shape of the earththe continents and the oceans was not ignorance but rather the illusion of knowledge There is a lot of illusion of knowledge being spread around in the data-warehousing environment Before you decide on anydata warehouse product ask yourself and the vendor these questions

bull As my data demands increase will the system be able to physically load the data Our experience showsthat many systems are not capable of handling very large volumes of data Do the math

bull As the data grows in volume can the system meet the performance requirements Do the mathbull As the number of users grows will the system be able to scale Do the math

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1358

bull As my environment changes will the system be flexible enough to allow changes quickly and easilyDo the math

bull Will the system need so many Database Administrators (DBAs) that my systems cost skyrockets Dothe math

bull If we suddenly merged with another company and needed to incorporate into their mainframe or LANenvironment would the system be able to connect and include them Do the math

bull Can I continue to meet my batch window timeframes Do the mathbull Could I become the hero of the company one day only to have some technical glitch blamed on me

because of my poor foresight and be thrown out of the company into a giant mud puddle Do the bath

Rule 9 - Take a Building Block Approach

Be not afraid of growing slowly be afraid only of standing still

Chinese Proverb

Ever since Vasco de Balboa discovered the Pacific coast of Panama in 1513 kings and businessmen alikedreamed of the impossible to cut a waterway across the mountainous isthmus creating a shortcut between theAtlantic and Pacific Oceans Those dreams turned into reality during the Industrial Revolution It took almost

forty years of trial and error before the worlds greatest engineering feat since the Pyramids was completed in1914 Ships move through the locks of the canal rising 85 above sea level before they descend to the oppositeside Since its grand opening in 1920 the Panama Canal has revolutionized trans-oceanic traffic joining Eastand West Its 50-mile stretch saves every vessel about 8000 extra nautical miles of travel around the bottom tipof South America Several modifications have been engineered through the years to accommodate theincreasing size of ships

Data warehouses like the Panama Canal must be built over time and changed over time to meet new demandsA data warehouse must grow with the environment but the environment is unpredictable All sailors know thatthey cant direct the wind but that they can adjust their sails In comparison all data warehouse users know theycant direct the environment but they can adjust their warehouse Sometimes the data warehouse will grow

quickly and sometimes it will grow slowly but it should always be growing

So take a building block approach to data warehousing Teradata allows you to expand without boundaries -one building block at a time Plus adding on building blocks is easy

There are two aspects to a building block approach First you need to add applications to your data warehousein three to six month intervals Once the first application works then you are ready for more projects As you become more experienced with this approach you can add multiple projects in parallel by involving multipleorganizations

The second aspect of the building block approach is in the actual data warehouse architecture It doesnt matter

if yours is the smallest data warehouse in the world the largest or falls somewhere in between power andscalability always fuel success

Not long ago a customer flew out to San Diego for a Teradata demonstration and benchmark The benchmark ran late into the evening but the numbers were more than 50 better than the competition The customer wasextremely impressed but before buying he demanded to see the system scalability that everyone had beentalking about Although it was already late a Teradata employee was called in the middle of the night arrivedwithin 10 minutes (in pajamas) hooked up the building blocks and ran a utility called config She ran anothercalled reconfig and in less than two hours the system size doubled

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1458

As the environment changes in terms of users data complexity capacity batch windows time changes eventsor opportunities users should be able to continue building applications and architecture The more a Teradatasystem grows the more Teradata outshines the competition

Rule 10 - Buy a Teradata Data Warehouse

Men occasionally stumble over the truth but most of them pick themselves up and hurry off as if nothing hadhappened

Winston Churchill

Winston Churchill led Britain through World War II during what he called that countrys finest hour Whenusers see consistent data the system too is in its finest hour Teradata gives users the ability to ask questionsthey could never ask before Users trust Teradata because of its industry performance and reputation and because it never gives in Constant use gives users optimal business experience and no matter what a user asks the system responds with a hearty Yes Sir

When we explained Teradata to Churchill he said

A data WARe-house that consists of 250 Data marts is like poison and if I were the MIS departmentresponsible for maintaining them Id take it

Teradata guarantees an Enterprise Data Warehouse with no scalability issues Data loads like lightning andsystem administration is a breeze You can pick the performance level that meets your requirements for todayand forever The database can be normalized around detail data and because of Teradatas power users have theflexibility to ask any question at any time on any data

All other databases are suspect in data loading capabilities scalability reference sites decades of datawarehouse experience flexibility system administration difficulties and inability to handle the complex queriesof todays users These users are good

TeradatamdashThe Shining Star

Overview

Teradata has always been at the top of the data warehouse game even if the experts werent bright enough toknow it The incredible vision that the original designers had was tremendous It was so far to the left of geniusthat most thought the idea was impossible

Only he who attempts the ridiculous may achieve the impossible

Don Quixote

The Teradata database was originally designed in 1976 and many of the fundamental concepts still remaintoday Nearly 25 years later Teradata is still considered ahead of its time

In 1976 IBM mainframes dominated the computer business Everyone who was anyone had an IBMMainframe However the original founders of Teradata noticed that it took about 4 frac12 years for IBM to producea new mainframe They also noticed a little company called Intel Intel created a new PC chip every 2frac12 yearsWith mainframes moving forward every 4 frac12 years and PC chip every 2frac12 years Teradata recognized their

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 8: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 858

A bird does not sing because it has the answers it sings because it has a song

A data warehouse built on detail data does not sing because it has a song it sings because it has the answersWhen you capture detail data answers to an infinite amount of questions are available But if this is truly thecase then why doesnt everybody build around detail data Well there are two reasons One is price Like a bird many companies decide to go cheap cheap But watch out The real expense is not the cost of the datawarehouse it is the money that you will not make without one The second reason is power Many companiesdont have the wingspan to fly through the detail so they sore with the summary In addition some companies

dont want to pay for the disk space it actually takes to keep detail data but believe me that cost is a small priceto pay for success

Once you miss the first buttonhole it becomes difficult to button your shirt

Many companies use the same database for their data warehouse as they have done for their OLTP system Thisis a critical mistake In essence they have missed the first buttonhole and most likely will lose their shirt ontheir data warehouse adventure

At this point companies no longer have a choice of using detail data They must summarize for performancereasons As one marine told his boot camp soldiers jokingly The beatings will continue until the moral

improves Similarly a database designed for OLTP takes a continual beating when it tries to query largeamounts of detail data

Companies building true data warehouses dont compromise on price and will have a data warehouse that is built for decision support not one that specializes in OLTP With this decision you have buttoned the first buttonhole and are well on your way to reaching the top

Detail data is the foundation that data warehouses are built upon Users can ask any question anytime andconduct data mining OLAP ROLAP SQL and SPL functions build data marts directly from the detail dataand can easily maintain and grow the environment on a daily basis Now thats a tune well worth singing Makea note of it

Rule 5 - Build Data Marts from the Detail

You cannot teach a man anything you can only help him find it within himself

Galileo

Galileo was a smart man How did he know so much about life and data marts When we explained to Galileodata marts he said You cannot build a data mart directly from the OLTP systems you can only build a datamart directly from the detail within He was right

Many companies build data mart after data mart directly from the OLTP systems and their universe begins torevolve around continual maintenance Then as things get worse as Galileo predicted their universe begins torevolve around the son The son of a gun sent in to replace them

Why does this happen At first things work out great but soon there are more and more requests for additionalinformation As a result more and more data marts are created and soon the system looks like a giant spider web Different data marts start to yield different results on like data and the actual maintenance of thiscomplicated spider web takes up most of ITs time Meanwhile short-term dreams turn into long-termnightmares like this one A man and his wife had had a big argument just before he went on a business tripFeeling rather contrite about his harsh words he arranged to send his wife some flowers and asked the florist to

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 958

write on the card Im sorry I love you The beautiful bouquet arrived at the door But then his wife read thewords the florist had actually written in haste Im sorry I love you

The top reasons to build data marts directly from detail data are

bull Users can get answers from the data mart but must validate their findings or check out additionalinformation from the detail that built it

bull There is only one consistent version of the truthbull Maintenance is easy

If a user comes up with a data mart answer that does not make sense then he or she has the ability to drill downinto the detail and investigate Sometimes summary data can spark interest and finding out the why can resultin big bucks

If users dont trust the data they wont use the systemWhen a data warehouse is built on a foundation of detaildata and then data marts are erected from that foundation you have a winning combination The results willalways be consistent and trustworthy However you should only build data marts when there is a credible business case and you should be ready to drop them when they are no longer needed The life span of a datamart is relatively short to that of its mother and father (better known as the detail data) If you build the data

mart from the detail it makes them easy to manage easy to drop and easy to change

Rule 6 - Make Scalability Your Best Friend

Plan your life for a million tomorrows and live your life as if tomorrow may be your last

Morgan Jones

The roar of class-6 rapids on a river in Suriname can be almost deafening against the dense walls of the jungleEspecially when you are 9 years old Our mission was to lower our canoe down the waterfall with ropes TheTrio Amer-Indian who anchored our 40-foot dugout canoe let go of the anchor rope too quickly Without

warning the heavy boat began a freefall through the rocky water with my father hanging onto the side for dear life He disappeared under the rocky waters and I knew for sure we had lost him My heart pounded in againstmy chest As I rallied myself to grasp this loss as only a nine year old can the Indians abruptly began cheeringwildly above the roar of the river My dad had resurfaced a hundred yards downstream battered and bruised but he was alive In just one short minute I determined that I would love my family every day as if there wereno tomorrow

As I made my family my best friend a data warehouse must make scalability its best friend A data warehousethat does not scale will have no tomorrow It is only a matter of time until the warehouse disappears in rockywaters only to never come up for air Dont let go of the anchor rope

The data-warehousing environment will throw obstacles in your way every single day A data warehouse must be planned to meet todays needs But it must also be capable of meeting tomorrows challenges The futurecannot be predicted so plan for unlimited growth or linear scalability - - both vertical and horizontal There areso many data warehouses that start out with sizzling performance but as they grow they eventually andinevitably hit the scalability wall However before they hit the wall there is a pattern of diminishing performance

A data warehouse designed without scalability in mind is doomed before it is begun It can never reach its potential Take the scalability question out of the equation by investing in a database that allows you to startsmall but grows linearly

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1058

In todays fast paced world Gigabytes soon become Terabytes It may not sound like much but it weighs a tonon the shoulders of giants Listen to these measurements and pick your data warehouses life span For exampleif you lived for a million seconds (Megabyte) then you would live for 115 days In comparison if you lived fora billion seconds (Gigabyte) then you would live for 315 years Plus if you lived for a trillion seconds(Terabyte) then you would live for 31688 years

How nice it would be on your 31688th

birthday that people would say You sure look good for your age

Data warehouses hit the wall of scalability because they cannot grow with the same degree that the amount of data being gathered grows Teradata allows for unlimited linear scalability Linear Scalability is a building block approach to data warehousing that ensures that as building blocks are added the system continues at the

same performance level

This is why the largest data warehouses in the world use Teradata I was lucky to be in the right place at theright time and taught beginning stages at what are considered the two largest data warehouse sites in the worldSouth Western Bell (SBC) and Wal-Mart

Wal-Marts data warehouse started with less than 30 gigabytes and SBC started with less than 200 gigabytesand 100 users Both warehouses

bull Started small and simplebull Used Teradata from the beginningbull Have built the largest Enterprise Data Warehouse in their respective industriesbull Continue to realize additional Return On Investment (ROI) on an annual basisbull Have grown to more than 10 Terabytes of data and are still growingbull Have thousands of users (some estimates are shocking)bull Have educated and experienced data warehouse staffsbull Have educated and experienced data warehouse usersbull Experience continual growth without boundariesbull Have experienced linear performance by Teradata in every single upgrade (from gigabytes to terabytes

and from terabytes to tens of terabytes)bull Both companies are impressed with Teradatas power and performancebull And both SBC and Wal-Mart are committed to the excellence of Teradata

A data warehouse is built in small building blocks Linear Scalability is described in three ways

First building blocks are added until the performance requirements of your environment are met (GuaranteedSuccess)

Second every time the data doubles building blocks are doubled and the system maintains its performancelevel (Guaranteed Success) and

Third any time the environment changes building blocks are added until performance requirements are met(Guaranteed Success)

Scalability is not just about growing the data volume It also means growing or increasing the number of usersMany systems work flawlessly until as few as 5 users are added then they slow down to a crawl Companiesneed a system where growth and performance are easily calculated and implemented That means where thenumber of users size and complexity of queries volume of data and number of applications being used can becalculated and compared to the current systems actual size If more power speed or size is needed then thecompany can simply add building blocks to the system until the requirements are met

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1158

Rule 7 - Model the Data Correctly

You will find only what you bring in

Yoda Jedi Master in Star Wars

We model a database for the same reasons that Boeing builds an aircraft model to test flight characteristics in awind tunnel Its simpler and cheaper to model than to reconstruct the plane by iterations until you get it right

A proper data model should be designed to reflect the business components and possible relationships

Here are three rules for modeling data in a data warehouse

1 Model the data quickly2 Normalize the detail data3 Use a dimensional model for data marts

The 3rd Normal Form believes each column in a table should be directly related to the primary key the wholekey and nothing but the key Data is placed into tables where it makes the most sense and has no repeatinggroups derived data or optional columns This allows users to ask any question at any time on all data within

the enterprise Users do not have to strive for 3rd Normal Form but just normalize the data the best they canThere will be fewer columns in a table but a lot more tables overall This model is easier to maintain incrediblyflexible and allows a user to ask any question on any data at any time

A Star-Schema model is comprised of a fact table and a number of dimension tables The fact table is a tablewith a multi-part key Each element of the key is itself a foreign key to a single dimension table Theremaining fields in the fact table are known as facts and are numeric continuously valued and additive Factscan be thought of as measurements taken at the intersection of all of the dimensions Dimension attributes aremostly textual and are almost always the source of constraints and report breaks This model enhances performance on known queries or in other words queries users run repeatedly day after day

Most database modelers prefer to create a logical model in 3rd Normal Form but most database engines areovercome by physical limitations so they must compromise the model The four most difficult functions for adatabase to handle are

bull Join tablesbull Aggregate databull Sort databull Scan large volumes of data

In order to get around these system limitations vendors will suggest a model to avoid joins use summarizeddata to avoid aggregation store data in sorted order to avoid sorts and overuse indexes to avoid large scans

With these limitations vendors are also going to avoid being able to compete That is like placing a ball andchain around the runners leg and saying I wish you all the best in the marathon Come on Whose side arethese vendors really on

Teradata is the only database engine I have seen that has the power and maturity to use a 3rd Normal Form physical model on databases exceeding a terabyte in size Because of the physical limitations other databaseshave had to use a Star-Schema model to enhance performance but have given up on the ability to perform ad-hoc queries and data mining

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1258

A normalized model is one that should be used for the central data warehouse It allows users to ask anyquestion at any time on information from any place within the enterprise This is the central philosophy of adata warehouse It leads to the power of ad-hoc queries and data mining whereby advanced tools discover relationships that are not easily detected but do exist naturally in the business environment

A Star-Schema model enhances performance on known queries because we build our assumptions into themodel While these assumptions may be correct for the first application they may not be correct for othersFlexibility is a big issue but data marts can be dropped and added with relative ease if each is built directly

from the detail data

Remember build the data warehouse around detail data using a normalized model Then as query patternsemerge and performance for well-known queries becomes a priority Star Schema data marts can be created by extracting summarized or departmental data from the centralized data warehouse The user will then haveaccess to both the data marts for repetitive queries and the central warehouse for other queries

Because data marts can be an administrative nightmare Teradata enables Star-Schema access withoutrequiring physical data marts By setting up a join index as the intersection of your Star-Schema model youcan create a Star-Schema structure directly from your 3 rd Normal Form data model Best of all once it iscreated the data is automatically maintained as the underlying tables are updated

Keep in mind 80 of data warehouse queries are repetitive but 80 of the Return On Investment (ROI) isactually provided by the other 20 of the queries that go against detailed data in an iterative environment Byusing a normalized model for your central data warehouse and a Star-Schema model on data marts you canenhance the possibility of realizing an 80 Return on Investment and still enhance the performance on 80 of your queries

Rule 8 - Dont Let a Technical Issue Make Your Data Warehouse a Failure Statistic

Experience is a hard teacher because she gives the test first the lesson afterwards

Scottish Proverb

Did you know that 34th

of the people in the world hate fractions and that 40 of the time a data warehouse failsis because of a technical issue There are many traps and pitfalls in every data warehouse venture One winter day a hunter met a bear in the forest The bear said Im hungry I want a full stomach The man repliedWell Im cold I would like a fur coat Lets compromise said the bear and he quickly gobbled up thehunter They both got what they asked for The bear went away with a full belly and the man left wrapped in afur coat With that in mind good judgment comes from experience experience comes from bad judgment Youhave shown good judgment by reading this book so let our experience keep your company from having a baddata warehouse experience

Author Daniel Borsten wrote in The Discoverist The greatest obstacle to discovering the shape of the earththe continents and the oceans was not ignorance but rather the illusion of knowledge There is a lot of illusion of knowledge being spread around in the data-warehousing environment Before you decide on anydata warehouse product ask yourself and the vendor these questions

bull As my data demands increase will the system be able to physically load the data Our experience showsthat many systems are not capable of handling very large volumes of data Do the math

bull As the data grows in volume can the system meet the performance requirements Do the mathbull As the number of users grows will the system be able to scale Do the math

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1358

bull As my environment changes will the system be flexible enough to allow changes quickly and easilyDo the math

bull Will the system need so many Database Administrators (DBAs) that my systems cost skyrockets Dothe math

bull If we suddenly merged with another company and needed to incorporate into their mainframe or LANenvironment would the system be able to connect and include them Do the math

bull Can I continue to meet my batch window timeframes Do the mathbull Could I become the hero of the company one day only to have some technical glitch blamed on me

because of my poor foresight and be thrown out of the company into a giant mud puddle Do the bath

Rule 9 - Take a Building Block Approach

Be not afraid of growing slowly be afraid only of standing still

Chinese Proverb

Ever since Vasco de Balboa discovered the Pacific coast of Panama in 1513 kings and businessmen alikedreamed of the impossible to cut a waterway across the mountainous isthmus creating a shortcut between theAtlantic and Pacific Oceans Those dreams turned into reality during the Industrial Revolution It took almost

forty years of trial and error before the worlds greatest engineering feat since the Pyramids was completed in1914 Ships move through the locks of the canal rising 85 above sea level before they descend to the oppositeside Since its grand opening in 1920 the Panama Canal has revolutionized trans-oceanic traffic joining Eastand West Its 50-mile stretch saves every vessel about 8000 extra nautical miles of travel around the bottom tipof South America Several modifications have been engineered through the years to accommodate theincreasing size of ships

Data warehouses like the Panama Canal must be built over time and changed over time to meet new demandsA data warehouse must grow with the environment but the environment is unpredictable All sailors know thatthey cant direct the wind but that they can adjust their sails In comparison all data warehouse users know theycant direct the environment but they can adjust their warehouse Sometimes the data warehouse will grow

quickly and sometimes it will grow slowly but it should always be growing

So take a building block approach to data warehousing Teradata allows you to expand without boundaries -one building block at a time Plus adding on building blocks is easy

There are two aspects to a building block approach First you need to add applications to your data warehousein three to six month intervals Once the first application works then you are ready for more projects As you become more experienced with this approach you can add multiple projects in parallel by involving multipleorganizations

The second aspect of the building block approach is in the actual data warehouse architecture It doesnt matter

if yours is the smallest data warehouse in the world the largest or falls somewhere in between power andscalability always fuel success

Not long ago a customer flew out to San Diego for a Teradata demonstration and benchmark The benchmark ran late into the evening but the numbers were more than 50 better than the competition The customer wasextremely impressed but before buying he demanded to see the system scalability that everyone had beentalking about Although it was already late a Teradata employee was called in the middle of the night arrivedwithin 10 minutes (in pajamas) hooked up the building blocks and ran a utility called config She ran anothercalled reconfig and in less than two hours the system size doubled

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1458

As the environment changes in terms of users data complexity capacity batch windows time changes eventsor opportunities users should be able to continue building applications and architecture The more a Teradatasystem grows the more Teradata outshines the competition

Rule 10 - Buy a Teradata Data Warehouse

Men occasionally stumble over the truth but most of them pick themselves up and hurry off as if nothing hadhappened

Winston Churchill

Winston Churchill led Britain through World War II during what he called that countrys finest hour Whenusers see consistent data the system too is in its finest hour Teradata gives users the ability to ask questionsthey could never ask before Users trust Teradata because of its industry performance and reputation and because it never gives in Constant use gives users optimal business experience and no matter what a user asks the system responds with a hearty Yes Sir

When we explained Teradata to Churchill he said

A data WARe-house that consists of 250 Data marts is like poison and if I were the MIS departmentresponsible for maintaining them Id take it

Teradata guarantees an Enterprise Data Warehouse with no scalability issues Data loads like lightning andsystem administration is a breeze You can pick the performance level that meets your requirements for todayand forever The database can be normalized around detail data and because of Teradatas power users have theflexibility to ask any question at any time on any data

All other databases are suspect in data loading capabilities scalability reference sites decades of datawarehouse experience flexibility system administration difficulties and inability to handle the complex queriesof todays users These users are good

TeradatamdashThe Shining Star

Overview

Teradata has always been at the top of the data warehouse game even if the experts werent bright enough toknow it The incredible vision that the original designers had was tremendous It was so far to the left of geniusthat most thought the idea was impossible

Only he who attempts the ridiculous may achieve the impossible

Don Quixote

The Teradata database was originally designed in 1976 and many of the fundamental concepts still remaintoday Nearly 25 years later Teradata is still considered ahead of its time

In 1976 IBM mainframes dominated the computer business Everyone who was anyone had an IBMMainframe However the original founders of Teradata noticed that it took about 4 frac12 years for IBM to producea new mainframe They also noticed a little company called Intel Intel created a new PC chip every 2frac12 yearsWith mainframes moving forward every 4 frac12 years and PC chip every 2frac12 years Teradata recognized their

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 9: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 958

write on the card Im sorry I love you The beautiful bouquet arrived at the door But then his wife read thewords the florist had actually written in haste Im sorry I love you

The top reasons to build data marts directly from detail data are

bull Users can get answers from the data mart but must validate their findings or check out additionalinformation from the detail that built it

bull There is only one consistent version of the truthbull Maintenance is easy

If a user comes up with a data mart answer that does not make sense then he or she has the ability to drill downinto the detail and investigate Sometimes summary data can spark interest and finding out the why can resultin big bucks

If users dont trust the data they wont use the systemWhen a data warehouse is built on a foundation of detaildata and then data marts are erected from that foundation you have a winning combination The results willalways be consistent and trustworthy However you should only build data marts when there is a credible business case and you should be ready to drop them when they are no longer needed The life span of a datamart is relatively short to that of its mother and father (better known as the detail data) If you build the data

mart from the detail it makes them easy to manage easy to drop and easy to change

Rule 6 - Make Scalability Your Best Friend

Plan your life for a million tomorrows and live your life as if tomorrow may be your last

Morgan Jones

The roar of class-6 rapids on a river in Suriname can be almost deafening against the dense walls of the jungleEspecially when you are 9 years old Our mission was to lower our canoe down the waterfall with ropes TheTrio Amer-Indian who anchored our 40-foot dugout canoe let go of the anchor rope too quickly Without

warning the heavy boat began a freefall through the rocky water with my father hanging onto the side for dear life He disappeared under the rocky waters and I knew for sure we had lost him My heart pounded in againstmy chest As I rallied myself to grasp this loss as only a nine year old can the Indians abruptly began cheeringwildly above the roar of the river My dad had resurfaced a hundred yards downstream battered and bruised but he was alive In just one short minute I determined that I would love my family every day as if there wereno tomorrow

As I made my family my best friend a data warehouse must make scalability its best friend A data warehousethat does not scale will have no tomorrow It is only a matter of time until the warehouse disappears in rockywaters only to never come up for air Dont let go of the anchor rope

The data-warehousing environment will throw obstacles in your way every single day A data warehouse must be planned to meet todays needs But it must also be capable of meeting tomorrows challenges The futurecannot be predicted so plan for unlimited growth or linear scalability - - both vertical and horizontal There areso many data warehouses that start out with sizzling performance but as they grow they eventually andinevitably hit the scalability wall However before they hit the wall there is a pattern of diminishing performance

A data warehouse designed without scalability in mind is doomed before it is begun It can never reach its potential Take the scalability question out of the equation by investing in a database that allows you to startsmall but grows linearly

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1058

In todays fast paced world Gigabytes soon become Terabytes It may not sound like much but it weighs a tonon the shoulders of giants Listen to these measurements and pick your data warehouses life span For exampleif you lived for a million seconds (Megabyte) then you would live for 115 days In comparison if you lived fora billion seconds (Gigabyte) then you would live for 315 years Plus if you lived for a trillion seconds(Terabyte) then you would live for 31688 years

How nice it would be on your 31688th

birthday that people would say You sure look good for your age

Data warehouses hit the wall of scalability because they cannot grow with the same degree that the amount of data being gathered grows Teradata allows for unlimited linear scalability Linear Scalability is a building block approach to data warehousing that ensures that as building blocks are added the system continues at the

same performance level

This is why the largest data warehouses in the world use Teradata I was lucky to be in the right place at theright time and taught beginning stages at what are considered the two largest data warehouse sites in the worldSouth Western Bell (SBC) and Wal-Mart

Wal-Marts data warehouse started with less than 30 gigabytes and SBC started with less than 200 gigabytesand 100 users Both warehouses

bull Started small and simplebull Used Teradata from the beginningbull Have built the largest Enterprise Data Warehouse in their respective industriesbull Continue to realize additional Return On Investment (ROI) on an annual basisbull Have grown to more than 10 Terabytes of data and are still growingbull Have thousands of users (some estimates are shocking)bull Have educated and experienced data warehouse staffsbull Have educated and experienced data warehouse usersbull Experience continual growth without boundariesbull Have experienced linear performance by Teradata in every single upgrade (from gigabytes to terabytes

and from terabytes to tens of terabytes)bull Both companies are impressed with Teradatas power and performancebull And both SBC and Wal-Mart are committed to the excellence of Teradata

A data warehouse is built in small building blocks Linear Scalability is described in three ways

First building blocks are added until the performance requirements of your environment are met (GuaranteedSuccess)

Second every time the data doubles building blocks are doubled and the system maintains its performancelevel (Guaranteed Success) and

Third any time the environment changes building blocks are added until performance requirements are met(Guaranteed Success)

Scalability is not just about growing the data volume It also means growing or increasing the number of usersMany systems work flawlessly until as few as 5 users are added then they slow down to a crawl Companiesneed a system where growth and performance are easily calculated and implemented That means where thenumber of users size and complexity of queries volume of data and number of applications being used can becalculated and compared to the current systems actual size If more power speed or size is needed then thecompany can simply add building blocks to the system until the requirements are met

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1158

Rule 7 - Model the Data Correctly

You will find only what you bring in

Yoda Jedi Master in Star Wars

We model a database for the same reasons that Boeing builds an aircraft model to test flight characteristics in awind tunnel Its simpler and cheaper to model than to reconstruct the plane by iterations until you get it right

A proper data model should be designed to reflect the business components and possible relationships

Here are three rules for modeling data in a data warehouse

1 Model the data quickly2 Normalize the detail data3 Use a dimensional model for data marts

The 3rd Normal Form believes each column in a table should be directly related to the primary key the wholekey and nothing but the key Data is placed into tables where it makes the most sense and has no repeatinggroups derived data or optional columns This allows users to ask any question at any time on all data within

the enterprise Users do not have to strive for 3rd Normal Form but just normalize the data the best they canThere will be fewer columns in a table but a lot more tables overall This model is easier to maintain incrediblyflexible and allows a user to ask any question on any data at any time

A Star-Schema model is comprised of a fact table and a number of dimension tables The fact table is a tablewith a multi-part key Each element of the key is itself a foreign key to a single dimension table Theremaining fields in the fact table are known as facts and are numeric continuously valued and additive Factscan be thought of as measurements taken at the intersection of all of the dimensions Dimension attributes aremostly textual and are almost always the source of constraints and report breaks This model enhances performance on known queries or in other words queries users run repeatedly day after day

Most database modelers prefer to create a logical model in 3rd Normal Form but most database engines areovercome by physical limitations so they must compromise the model The four most difficult functions for adatabase to handle are

bull Join tablesbull Aggregate databull Sort databull Scan large volumes of data

In order to get around these system limitations vendors will suggest a model to avoid joins use summarizeddata to avoid aggregation store data in sorted order to avoid sorts and overuse indexes to avoid large scans

With these limitations vendors are also going to avoid being able to compete That is like placing a ball andchain around the runners leg and saying I wish you all the best in the marathon Come on Whose side arethese vendors really on

Teradata is the only database engine I have seen that has the power and maturity to use a 3rd Normal Form physical model on databases exceeding a terabyte in size Because of the physical limitations other databaseshave had to use a Star-Schema model to enhance performance but have given up on the ability to perform ad-hoc queries and data mining

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1258

A normalized model is one that should be used for the central data warehouse It allows users to ask anyquestion at any time on information from any place within the enterprise This is the central philosophy of adata warehouse It leads to the power of ad-hoc queries and data mining whereby advanced tools discover relationships that are not easily detected but do exist naturally in the business environment

A Star-Schema model enhances performance on known queries because we build our assumptions into themodel While these assumptions may be correct for the first application they may not be correct for othersFlexibility is a big issue but data marts can be dropped and added with relative ease if each is built directly

from the detail data

Remember build the data warehouse around detail data using a normalized model Then as query patternsemerge and performance for well-known queries becomes a priority Star Schema data marts can be created by extracting summarized or departmental data from the centralized data warehouse The user will then haveaccess to both the data marts for repetitive queries and the central warehouse for other queries

Because data marts can be an administrative nightmare Teradata enables Star-Schema access withoutrequiring physical data marts By setting up a join index as the intersection of your Star-Schema model youcan create a Star-Schema structure directly from your 3 rd Normal Form data model Best of all once it iscreated the data is automatically maintained as the underlying tables are updated

Keep in mind 80 of data warehouse queries are repetitive but 80 of the Return On Investment (ROI) isactually provided by the other 20 of the queries that go against detailed data in an iterative environment Byusing a normalized model for your central data warehouse and a Star-Schema model on data marts you canenhance the possibility of realizing an 80 Return on Investment and still enhance the performance on 80 of your queries

Rule 8 - Dont Let a Technical Issue Make Your Data Warehouse a Failure Statistic

Experience is a hard teacher because she gives the test first the lesson afterwards

Scottish Proverb

Did you know that 34th

of the people in the world hate fractions and that 40 of the time a data warehouse failsis because of a technical issue There are many traps and pitfalls in every data warehouse venture One winter day a hunter met a bear in the forest The bear said Im hungry I want a full stomach The man repliedWell Im cold I would like a fur coat Lets compromise said the bear and he quickly gobbled up thehunter They both got what they asked for The bear went away with a full belly and the man left wrapped in afur coat With that in mind good judgment comes from experience experience comes from bad judgment Youhave shown good judgment by reading this book so let our experience keep your company from having a baddata warehouse experience

Author Daniel Borsten wrote in The Discoverist The greatest obstacle to discovering the shape of the earththe continents and the oceans was not ignorance but rather the illusion of knowledge There is a lot of illusion of knowledge being spread around in the data-warehousing environment Before you decide on anydata warehouse product ask yourself and the vendor these questions

bull As my data demands increase will the system be able to physically load the data Our experience showsthat many systems are not capable of handling very large volumes of data Do the math

bull As the data grows in volume can the system meet the performance requirements Do the mathbull As the number of users grows will the system be able to scale Do the math

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1358

bull As my environment changes will the system be flexible enough to allow changes quickly and easilyDo the math

bull Will the system need so many Database Administrators (DBAs) that my systems cost skyrockets Dothe math

bull If we suddenly merged with another company and needed to incorporate into their mainframe or LANenvironment would the system be able to connect and include them Do the math

bull Can I continue to meet my batch window timeframes Do the mathbull Could I become the hero of the company one day only to have some technical glitch blamed on me

because of my poor foresight and be thrown out of the company into a giant mud puddle Do the bath

Rule 9 - Take a Building Block Approach

Be not afraid of growing slowly be afraid only of standing still

Chinese Proverb

Ever since Vasco de Balboa discovered the Pacific coast of Panama in 1513 kings and businessmen alikedreamed of the impossible to cut a waterway across the mountainous isthmus creating a shortcut between theAtlantic and Pacific Oceans Those dreams turned into reality during the Industrial Revolution It took almost

forty years of trial and error before the worlds greatest engineering feat since the Pyramids was completed in1914 Ships move through the locks of the canal rising 85 above sea level before they descend to the oppositeside Since its grand opening in 1920 the Panama Canal has revolutionized trans-oceanic traffic joining Eastand West Its 50-mile stretch saves every vessel about 8000 extra nautical miles of travel around the bottom tipof South America Several modifications have been engineered through the years to accommodate theincreasing size of ships

Data warehouses like the Panama Canal must be built over time and changed over time to meet new demandsA data warehouse must grow with the environment but the environment is unpredictable All sailors know thatthey cant direct the wind but that they can adjust their sails In comparison all data warehouse users know theycant direct the environment but they can adjust their warehouse Sometimes the data warehouse will grow

quickly and sometimes it will grow slowly but it should always be growing

So take a building block approach to data warehousing Teradata allows you to expand without boundaries -one building block at a time Plus adding on building blocks is easy

There are two aspects to a building block approach First you need to add applications to your data warehousein three to six month intervals Once the first application works then you are ready for more projects As you become more experienced with this approach you can add multiple projects in parallel by involving multipleorganizations

The second aspect of the building block approach is in the actual data warehouse architecture It doesnt matter

if yours is the smallest data warehouse in the world the largest or falls somewhere in between power andscalability always fuel success

Not long ago a customer flew out to San Diego for a Teradata demonstration and benchmark The benchmark ran late into the evening but the numbers were more than 50 better than the competition The customer wasextremely impressed but before buying he demanded to see the system scalability that everyone had beentalking about Although it was already late a Teradata employee was called in the middle of the night arrivedwithin 10 minutes (in pajamas) hooked up the building blocks and ran a utility called config She ran anothercalled reconfig and in less than two hours the system size doubled

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1458

As the environment changes in terms of users data complexity capacity batch windows time changes eventsor opportunities users should be able to continue building applications and architecture The more a Teradatasystem grows the more Teradata outshines the competition

Rule 10 - Buy a Teradata Data Warehouse

Men occasionally stumble over the truth but most of them pick themselves up and hurry off as if nothing hadhappened

Winston Churchill

Winston Churchill led Britain through World War II during what he called that countrys finest hour Whenusers see consistent data the system too is in its finest hour Teradata gives users the ability to ask questionsthey could never ask before Users trust Teradata because of its industry performance and reputation and because it never gives in Constant use gives users optimal business experience and no matter what a user asks the system responds with a hearty Yes Sir

When we explained Teradata to Churchill he said

A data WARe-house that consists of 250 Data marts is like poison and if I were the MIS departmentresponsible for maintaining them Id take it

Teradata guarantees an Enterprise Data Warehouse with no scalability issues Data loads like lightning andsystem administration is a breeze You can pick the performance level that meets your requirements for todayand forever The database can be normalized around detail data and because of Teradatas power users have theflexibility to ask any question at any time on any data

All other databases are suspect in data loading capabilities scalability reference sites decades of datawarehouse experience flexibility system administration difficulties and inability to handle the complex queriesof todays users These users are good

TeradatamdashThe Shining Star

Overview

Teradata has always been at the top of the data warehouse game even if the experts werent bright enough toknow it The incredible vision that the original designers had was tremendous It was so far to the left of geniusthat most thought the idea was impossible

Only he who attempts the ridiculous may achieve the impossible

Don Quixote

The Teradata database was originally designed in 1976 and many of the fundamental concepts still remaintoday Nearly 25 years later Teradata is still considered ahead of its time

In 1976 IBM mainframes dominated the computer business Everyone who was anyone had an IBMMainframe However the original founders of Teradata noticed that it took about 4 frac12 years for IBM to producea new mainframe They also noticed a little company called Intel Intel created a new PC chip every 2frac12 yearsWith mainframes moving forward every 4 frac12 years and PC chip every 2frac12 years Teradata recognized their

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 10: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1058

In todays fast paced world Gigabytes soon become Terabytes It may not sound like much but it weighs a tonon the shoulders of giants Listen to these measurements and pick your data warehouses life span For exampleif you lived for a million seconds (Megabyte) then you would live for 115 days In comparison if you lived fora billion seconds (Gigabyte) then you would live for 315 years Plus if you lived for a trillion seconds(Terabyte) then you would live for 31688 years

How nice it would be on your 31688th

birthday that people would say You sure look good for your age

Data warehouses hit the wall of scalability because they cannot grow with the same degree that the amount of data being gathered grows Teradata allows for unlimited linear scalability Linear Scalability is a building block approach to data warehousing that ensures that as building blocks are added the system continues at the

same performance level

This is why the largest data warehouses in the world use Teradata I was lucky to be in the right place at theright time and taught beginning stages at what are considered the two largest data warehouse sites in the worldSouth Western Bell (SBC) and Wal-Mart

Wal-Marts data warehouse started with less than 30 gigabytes and SBC started with less than 200 gigabytesand 100 users Both warehouses

bull Started small and simplebull Used Teradata from the beginningbull Have built the largest Enterprise Data Warehouse in their respective industriesbull Continue to realize additional Return On Investment (ROI) on an annual basisbull Have grown to more than 10 Terabytes of data and are still growingbull Have thousands of users (some estimates are shocking)bull Have educated and experienced data warehouse staffsbull Have educated and experienced data warehouse usersbull Experience continual growth without boundariesbull Have experienced linear performance by Teradata in every single upgrade (from gigabytes to terabytes

and from terabytes to tens of terabytes)bull Both companies are impressed with Teradatas power and performancebull And both SBC and Wal-Mart are committed to the excellence of Teradata

A data warehouse is built in small building blocks Linear Scalability is described in three ways

First building blocks are added until the performance requirements of your environment are met (GuaranteedSuccess)

Second every time the data doubles building blocks are doubled and the system maintains its performancelevel (Guaranteed Success) and

Third any time the environment changes building blocks are added until performance requirements are met(Guaranteed Success)

Scalability is not just about growing the data volume It also means growing or increasing the number of usersMany systems work flawlessly until as few as 5 users are added then they slow down to a crawl Companiesneed a system where growth and performance are easily calculated and implemented That means where thenumber of users size and complexity of queries volume of data and number of applications being used can becalculated and compared to the current systems actual size If more power speed or size is needed then thecompany can simply add building blocks to the system until the requirements are met

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1158

Rule 7 - Model the Data Correctly

You will find only what you bring in

Yoda Jedi Master in Star Wars

We model a database for the same reasons that Boeing builds an aircraft model to test flight characteristics in awind tunnel Its simpler and cheaper to model than to reconstruct the plane by iterations until you get it right

A proper data model should be designed to reflect the business components and possible relationships

Here are three rules for modeling data in a data warehouse

1 Model the data quickly2 Normalize the detail data3 Use a dimensional model for data marts

The 3rd Normal Form believes each column in a table should be directly related to the primary key the wholekey and nothing but the key Data is placed into tables where it makes the most sense and has no repeatinggroups derived data or optional columns This allows users to ask any question at any time on all data within

the enterprise Users do not have to strive for 3rd Normal Form but just normalize the data the best they canThere will be fewer columns in a table but a lot more tables overall This model is easier to maintain incrediblyflexible and allows a user to ask any question on any data at any time

A Star-Schema model is comprised of a fact table and a number of dimension tables The fact table is a tablewith a multi-part key Each element of the key is itself a foreign key to a single dimension table Theremaining fields in the fact table are known as facts and are numeric continuously valued and additive Factscan be thought of as measurements taken at the intersection of all of the dimensions Dimension attributes aremostly textual and are almost always the source of constraints and report breaks This model enhances performance on known queries or in other words queries users run repeatedly day after day

Most database modelers prefer to create a logical model in 3rd Normal Form but most database engines areovercome by physical limitations so they must compromise the model The four most difficult functions for adatabase to handle are

bull Join tablesbull Aggregate databull Sort databull Scan large volumes of data

In order to get around these system limitations vendors will suggest a model to avoid joins use summarizeddata to avoid aggregation store data in sorted order to avoid sorts and overuse indexes to avoid large scans

With these limitations vendors are also going to avoid being able to compete That is like placing a ball andchain around the runners leg and saying I wish you all the best in the marathon Come on Whose side arethese vendors really on

Teradata is the only database engine I have seen that has the power and maturity to use a 3rd Normal Form physical model on databases exceeding a terabyte in size Because of the physical limitations other databaseshave had to use a Star-Schema model to enhance performance but have given up on the ability to perform ad-hoc queries and data mining

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1258

A normalized model is one that should be used for the central data warehouse It allows users to ask anyquestion at any time on information from any place within the enterprise This is the central philosophy of adata warehouse It leads to the power of ad-hoc queries and data mining whereby advanced tools discover relationships that are not easily detected but do exist naturally in the business environment

A Star-Schema model enhances performance on known queries because we build our assumptions into themodel While these assumptions may be correct for the first application they may not be correct for othersFlexibility is a big issue but data marts can be dropped and added with relative ease if each is built directly

from the detail data

Remember build the data warehouse around detail data using a normalized model Then as query patternsemerge and performance for well-known queries becomes a priority Star Schema data marts can be created by extracting summarized or departmental data from the centralized data warehouse The user will then haveaccess to both the data marts for repetitive queries and the central warehouse for other queries

Because data marts can be an administrative nightmare Teradata enables Star-Schema access withoutrequiring physical data marts By setting up a join index as the intersection of your Star-Schema model youcan create a Star-Schema structure directly from your 3 rd Normal Form data model Best of all once it iscreated the data is automatically maintained as the underlying tables are updated

Keep in mind 80 of data warehouse queries are repetitive but 80 of the Return On Investment (ROI) isactually provided by the other 20 of the queries that go against detailed data in an iterative environment Byusing a normalized model for your central data warehouse and a Star-Schema model on data marts you canenhance the possibility of realizing an 80 Return on Investment and still enhance the performance on 80 of your queries

Rule 8 - Dont Let a Technical Issue Make Your Data Warehouse a Failure Statistic

Experience is a hard teacher because she gives the test first the lesson afterwards

Scottish Proverb

Did you know that 34th

of the people in the world hate fractions and that 40 of the time a data warehouse failsis because of a technical issue There are many traps and pitfalls in every data warehouse venture One winter day a hunter met a bear in the forest The bear said Im hungry I want a full stomach The man repliedWell Im cold I would like a fur coat Lets compromise said the bear and he quickly gobbled up thehunter They both got what they asked for The bear went away with a full belly and the man left wrapped in afur coat With that in mind good judgment comes from experience experience comes from bad judgment Youhave shown good judgment by reading this book so let our experience keep your company from having a baddata warehouse experience

Author Daniel Borsten wrote in The Discoverist The greatest obstacle to discovering the shape of the earththe continents and the oceans was not ignorance but rather the illusion of knowledge There is a lot of illusion of knowledge being spread around in the data-warehousing environment Before you decide on anydata warehouse product ask yourself and the vendor these questions

bull As my data demands increase will the system be able to physically load the data Our experience showsthat many systems are not capable of handling very large volumes of data Do the math

bull As the data grows in volume can the system meet the performance requirements Do the mathbull As the number of users grows will the system be able to scale Do the math

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1358

bull As my environment changes will the system be flexible enough to allow changes quickly and easilyDo the math

bull Will the system need so many Database Administrators (DBAs) that my systems cost skyrockets Dothe math

bull If we suddenly merged with another company and needed to incorporate into their mainframe or LANenvironment would the system be able to connect and include them Do the math

bull Can I continue to meet my batch window timeframes Do the mathbull Could I become the hero of the company one day only to have some technical glitch blamed on me

because of my poor foresight and be thrown out of the company into a giant mud puddle Do the bath

Rule 9 - Take a Building Block Approach

Be not afraid of growing slowly be afraid only of standing still

Chinese Proverb

Ever since Vasco de Balboa discovered the Pacific coast of Panama in 1513 kings and businessmen alikedreamed of the impossible to cut a waterway across the mountainous isthmus creating a shortcut between theAtlantic and Pacific Oceans Those dreams turned into reality during the Industrial Revolution It took almost

forty years of trial and error before the worlds greatest engineering feat since the Pyramids was completed in1914 Ships move through the locks of the canal rising 85 above sea level before they descend to the oppositeside Since its grand opening in 1920 the Panama Canal has revolutionized trans-oceanic traffic joining Eastand West Its 50-mile stretch saves every vessel about 8000 extra nautical miles of travel around the bottom tipof South America Several modifications have been engineered through the years to accommodate theincreasing size of ships

Data warehouses like the Panama Canal must be built over time and changed over time to meet new demandsA data warehouse must grow with the environment but the environment is unpredictable All sailors know thatthey cant direct the wind but that they can adjust their sails In comparison all data warehouse users know theycant direct the environment but they can adjust their warehouse Sometimes the data warehouse will grow

quickly and sometimes it will grow slowly but it should always be growing

So take a building block approach to data warehousing Teradata allows you to expand without boundaries -one building block at a time Plus adding on building blocks is easy

There are two aspects to a building block approach First you need to add applications to your data warehousein three to six month intervals Once the first application works then you are ready for more projects As you become more experienced with this approach you can add multiple projects in parallel by involving multipleorganizations

The second aspect of the building block approach is in the actual data warehouse architecture It doesnt matter

if yours is the smallest data warehouse in the world the largest or falls somewhere in between power andscalability always fuel success

Not long ago a customer flew out to San Diego for a Teradata demonstration and benchmark The benchmark ran late into the evening but the numbers were more than 50 better than the competition The customer wasextremely impressed but before buying he demanded to see the system scalability that everyone had beentalking about Although it was already late a Teradata employee was called in the middle of the night arrivedwithin 10 minutes (in pajamas) hooked up the building blocks and ran a utility called config She ran anothercalled reconfig and in less than two hours the system size doubled

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1458

As the environment changes in terms of users data complexity capacity batch windows time changes eventsor opportunities users should be able to continue building applications and architecture The more a Teradatasystem grows the more Teradata outshines the competition

Rule 10 - Buy a Teradata Data Warehouse

Men occasionally stumble over the truth but most of them pick themselves up and hurry off as if nothing hadhappened

Winston Churchill

Winston Churchill led Britain through World War II during what he called that countrys finest hour Whenusers see consistent data the system too is in its finest hour Teradata gives users the ability to ask questionsthey could never ask before Users trust Teradata because of its industry performance and reputation and because it never gives in Constant use gives users optimal business experience and no matter what a user asks the system responds with a hearty Yes Sir

When we explained Teradata to Churchill he said

A data WARe-house that consists of 250 Data marts is like poison and if I were the MIS departmentresponsible for maintaining them Id take it

Teradata guarantees an Enterprise Data Warehouse with no scalability issues Data loads like lightning andsystem administration is a breeze You can pick the performance level that meets your requirements for todayand forever The database can be normalized around detail data and because of Teradatas power users have theflexibility to ask any question at any time on any data

All other databases are suspect in data loading capabilities scalability reference sites decades of datawarehouse experience flexibility system administration difficulties and inability to handle the complex queriesof todays users These users are good

TeradatamdashThe Shining Star

Overview

Teradata has always been at the top of the data warehouse game even if the experts werent bright enough toknow it The incredible vision that the original designers had was tremendous It was so far to the left of geniusthat most thought the idea was impossible

Only he who attempts the ridiculous may achieve the impossible

Don Quixote

The Teradata database was originally designed in 1976 and many of the fundamental concepts still remaintoday Nearly 25 years later Teradata is still considered ahead of its time

In 1976 IBM mainframes dominated the computer business Everyone who was anyone had an IBMMainframe However the original founders of Teradata noticed that it took about 4 frac12 years for IBM to producea new mainframe They also noticed a little company called Intel Intel created a new PC chip every 2frac12 yearsWith mainframes moving forward every 4 frac12 years and PC chip every 2frac12 years Teradata recognized their

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 11: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1158

Rule 7 - Model the Data Correctly

You will find only what you bring in

Yoda Jedi Master in Star Wars

We model a database for the same reasons that Boeing builds an aircraft model to test flight characteristics in awind tunnel Its simpler and cheaper to model than to reconstruct the plane by iterations until you get it right

A proper data model should be designed to reflect the business components and possible relationships

Here are three rules for modeling data in a data warehouse

1 Model the data quickly2 Normalize the detail data3 Use a dimensional model for data marts

The 3rd Normal Form believes each column in a table should be directly related to the primary key the wholekey and nothing but the key Data is placed into tables where it makes the most sense and has no repeatinggroups derived data or optional columns This allows users to ask any question at any time on all data within

the enterprise Users do not have to strive for 3rd Normal Form but just normalize the data the best they canThere will be fewer columns in a table but a lot more tables overall This model is easier to maintain incrediblyflexible and allows a user to ask any question on any data at any time

A Star-Schema model is comprised of a fact table and a number of dimension tables The fact table is a tablewith a multi-part key Each element of the key is itself a foreign key to a single dimension table Theremaining fields in the fact table are known as facts and are numeric continuously valued and additive Factscan be thought of as measurements taken at the intersection of all of the dimensions Dimension attributes aremostly textual and are almost always the source of constraints and report breaks This model enhances performance on known queries or in other words queries users run repeatedly day after day

Most database modelers prefer to create a logical model in 3rd Normal Form but most database engines areovercome by physical limitations so they must compromise the model The four most difficult functions for adatabase to handle are

bull Join tablesbull Aggregate databull Sort databull Scan large volumes of data

In order to get around these system limitations vendors will suggest a model to avoid joins use summarizeddata to avoid aggregation store data in sorted order to avoid sorts and overuse indexes to avoid large scans

With these limitations vendors are also going to avoid being able to compete That is like placing a ball andchain around the runners leg and saying I wish you all the best in the marathon Come on Whose side arethese vendors really on

Teradata is the only database engine I have seen that has the power and maturity to use a 3rd Normal Form physical model on databases exceeding a terabyte in size Because of the physical limitations other databaseshave had to use a Star-Schema model to enhance performance but have given up on the ability to perform ad-hoc queries and data mining

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1258

A normalized model is one that should be used for the central data warehouse It allows users to ask anyquestion at any time on information from any place within the enterprise This is the central philosophy of adata warehouse It leads to the power of ad-hoc queries and data mining whereby advanced tools discover relationships that are not easily detected but do exist naturally in the business environment

A Star-Schema model enhances performance on known queries because we build our assumptions into themodel While these assumptions may be correct for the first application they may not be correct for othersFlexibility is a big issue but data marts can be dropped and added with relative ease if each is built directly

from the detail data

Remember build the data warehouse around detail data using a normalized model Then as query patternsemerge and performance for well-known queries becomes a priority Star Schema data marts can be created by extracting summarized or departmental data from the centralized data warehouse The user will then haveaccess to both the data marts for repetitive queries and the central warehouse for other queries

Because data marts can be an administrative nightmare Teradata enables Star-Schema access withoutrequiring physical data marts By setting up a join index as the intersection of your Star-Schema model youcan create a Star-Schema structure directly from your 3 rd Normal Form data model Best of all once it iscreated the data is automatically maintained as the underlying tables are updated

Keep in mind 80 of data warehouse queries are repetitive but 80 of the Return On Investment (ROI) isactually provided by the other 20 of the queries that go against detailed data in an iterative environment Byusing a normalized model for your central data warehouse and a Star-Schema model on data marts you canenhance the possibility of realizing an 80 Return on Investment and still enhance the performance on 80 of your queries

Rule 8 - Dont Let a Technical Issue Make Your Data Warehouse a Failure Statistic

Experience is a hard teacher because she gives the test first the lesson afterwards

Scottish Proverb

Did you know that 34th

of the people in the world hate fractions and that 40 of the time a data warehouse failsis because of a technical issue There are many traps and pitfalls in every data warehouse venture One winter day a hunter met a bear in the forest The bear said Im hungry I want a full stomach The man repliedWell Im cold I would like a fur coat Lets compromise said the bear and he quickly gobbled up thehunter They both got what they asked for The bear went away with a full belly and the man left wrapped in afur coat With that in mind good judgment comes from experience experience comes from bad judgment Youhave shown good judgment by reading this book so let our experience keep your company from having a baddata warehouse experience

Author Daniel Borsten wrote in The Discoverist The greatest obstacle to discovering the shape of the earththe continents and the oceans was not ignorance but rather the illusion of knowledge There is a lot of illusion of knowledge being spread around in the data-warehousing environment Before you decide on anydata warehouse product ask yourself and the vendor these questions

bull As my data demands increase will the system be able to physically load the data Our experience showsthat many systems are not capable of handling very large volumes of data Do the math

bull As the data grows in volume can the system meet the performance requirements Do the mathbull As the number of users grows will the system be able to scale Do the math

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1358

bull As my environment changes will the system be flexible enough to allow changes quickly and easilyDo the math

bull Will the system need so many Database Administrators (DBAs) that my systems cost skyrockets Dothe math

bull If we suddenly merged with another company and needed to incorporate into their mainframe or LANenvironment would the system be able to connect and include them Do the math

bull Can I continue to meet my batch window timeframes Do the mathbull Could I become the hero of the company one day only to have some technical glitch blamed on me

because of my poor foresight and be thrown out of the company into a giant mud puddle Do the bath

Rule 9 - Take a Building Block Approach

Be not afraid of growing slowly be afraid only of standing still

Chinese Proverb

Ever since Vasco de Balboa discovered the Pacific coast of Panama in 1513 kings and businessmen alikedreamed of the impossible to cut a waterway across the mountainous isthmus creating a shortcut between theAtlantic and Pacific Oceans Those dreams turned into reality during the Industrial Revolution It took almost

forty years of trial and error before the worlds greatest engineering feat since the Pyramids was completed in1914 Ships move through the locks of the canal rising 85 above sea level before they descend to the oppositeside Since its grand opening in 1920 the Panama Canal has revolutionized trans-oceanic traffic joining Eastand West Its 50-mile stretch saves every vessel about 8000 extra nautical miles of travel around the bottom tipof South America Several modifications have been engineered through the years to accommodate theincreasing size of ships

Data warehouses like the Panama Canal must be built over time and changed over time to meet new demandsA data warehouse must grow with the environment but the environment is unpredictable All sailors know thatthey cant direct the wind but that they can adjust their sails In comparison all data warehouse users know theycant direct the environment but they can adjust their warehouse Sometimes the data warehouse will grow

quickly and sometimes it will grow slowly but it should always be growing

So take a building block approach to data warehousing Teradata allows you to expand without boundaries -one building block at a time Plus adding on building blocks is easy

There are two aspects to a building block approach First you need to add applications to your data warehousein three to six month intervals Once the first application works then you are ready for more projects As you become more experienced with this approach you can add multiple projects in parallel by involving multipleorganizations

The second aspect of the building block approach is in the actual data warehouse architecture It doesnt matter

if yours is the smallest data warehouse in the world the largest or falls somewhere in between power andscalability always fuel success

Not long ago a customer flew out to San Diego for a Teradata demonstration and benchmark The benchmark ran late into the evening but the numbers were more than 50 better than the competition The customer wasextremely impressed but before buying he demanded to see the system scalability that everyone had beentalking about Although it was already late a Teradata employee was called in the middle of the night arrivedwithin 10 minutes (in pajamas) hooked up the building blocks and ran a utility called config She ran anothercalled reconfig and in less than two hours the system size doubled

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1458

As the environment changes in terms of users data complexity capacity batch windows time changes eventsor opportunities users should be able to continue building applications and architecture The more a Teradatasystem grows the more Teradata outshines the competition

Rule 10 - Buy a Teradata Data Warehouse

Men occasionally stumble over the truth but most of them pick themselves up and hurry off as if nothing hadhappened

Winston Churchill

Winston Churchill led Britain through World War II during what he called that countrys finest hour Whenusers see consistent data the system too is in its finest hour Teradata gives users the ability to ask questionsthey could never ask before Users trust Teradata because of its industry performance and reputation and because it never gives in Constant use gives users optimal business experience and no matter what a user asks the system responds with a hearty Yes Sir

When we explained Teradata to Churchill he said

A data WARe-house that consists of 250 Data marts is like poison and if I were the MIS departmentresponsible for maintaining them Id take it

Teradata guarantees an Enterprise Data Warehouse with no scalability issues Data loads like lightning andsystem administration is a breeze You can pick the performance level that meets your requirements for todayand forever The database can be normalized around detail data and because of Teradatas power users have theflexibility to ask any question at any time on any data

All other databases are suspect in data loading capabilities scalability reference sites decades of datawarehouse experience flexibility system administration difficulties and inability to handle the complex queriesof todays users These users are good

TeradatamdashThe Shining Star

Overview

Teradata has always been at the top of the data warehouse game even if the experts werent bright enough toknow it The incredible vision that the original designers had was tremendous It was so far to the left of geniusthat most thought the idea was impossible

Only he who attempts the ridiculous may achieve the impossible

Don Quixote

The Teradata database was originally designed in 1976 and many of the fundamental concepts still remaintoday Nearly 25 years later Teradata is still considered ahead of its time

In 1976 IBM mainframes dominated the computer business Everyone who was anyone had an IBMMainframe However the original founders of Teradata noticed that it took about 4 frac12 years for IBM to producea new mainframe They also noticed a little company called Intel Intel created a new PC chip every 2frac12 yearsWith mainframes moving forward every 4 frac12 years and PC chip every 2frac12 years Teradata recognized their

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 12: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1258

A normalized model is one that should be used for the central data warehouse It allows users to ask anyquestion at any time on information from any place within the enterprise This is the central philosophy of adata warehouse It leads to the power of ad-hoc queries and data mining whereby advanced tools discover relationships that are not easily detected but do exist naturally in the business environment

A Star-Schema model enhances performance on known queries because we build our assumptions into themodel While these assumptions may be correct for the first application they may not be correct for othersFlexibility is a big issue but data marts can be dropped and added with relative ease if each is built directly

from the detail data

Remember build the data warehouse around detail data using a normalized model Then as query patternsemerge and performance for well-known queries becomes a priority Star Schema data marts can be created by extracting summarized or departmental data from the centralized data warehouse The user will then haveaccess to both the data marts for repetitive queries and the central warehouse for other queries

Because data marts can be an administrative nightmare Teradata enables Star-Schema access withoutrequiring physical data marts By setting up a join index as the intersection of your Star-Schema model youcan create a Star-Schema structure directly from your 3 rd Normal Form data model Best of all once it iscreated the data is automatically maintained as the underlying tables are updated

Keep in mind 80 of data warehouse queries are repetitive but 80 of the Return On Investment (ROI) isactually provided by the other 20 of the queries that go against detailed data in an iterative environment Byusing a normalized model for your central data warehouse and a Star-Schema model on data marts you canenhance the possibility of realizing an 80 Return on Investment and still enhance the performance on 80 of your queries

Rule 8 - Dont Let a Technical Issue Make Your Data Warehouse a Failure Statistic

Experience is a hard teacher because she gives the test first the lesson afterwards

Scottish Proverb

Did you know that 34th

of the people in the world hate fractions and that 40 of the time a data warehouse failsis because of a technical issue There are many traps and pitfalls in every data warehouse venture One winter day a hunter met a bear in the forest The bear said Im hungry I want a full stomach The man repliedWell Im cold I would like a fur coat Lets compromise said the bear and he quickly gobbled up thehunter They both got what they asked for The bear went away with a full belly and the man left wrapped in afur coat With that in mind good judgment comes from experience experience comes from bad judgment Youhave shown good judgment by reading this book so let our experience keep your company from having a baddata warehouse experience

Author Daniel Borsten wrote in The Discoverist The greatest obstacle to discovering the shape of the earththe continents and the oceans was not ignorance but rather the illusion of knowledge There is a lot of illusion of knowledge being spread around in the data-warehousing environment Before you decide on anydata warehouse product ask yourself and the vendor these questions

bull As my data demands increase will the system be able to physically load the data Our experience showsthat many systems are not capable of handling very large volumes of data Do the math

bull As the data grows in volume can the system meet the performance requirements Do the mathbull As the number of users grows will the system be able to scale Do the math

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1358

bull As my environment changes will the system be flexible enough to allow changes quickly and easilyDo the math

bull Will the system need so many Database Administrators (DBAs) that my systems cost skyrockets Dothe math

bull If we suddenly merged with another company and needed to incorporate into their mainframe or LANenvironment would the system be able to connect and include them Do the math

bull Can I continue to meet my batch window timeframes Do the mathbull Could I become the hero of the company one day only to have some technical glitch blamed on me

because of my poor foresight and be thrown out of the company into a giant mud puddle Do the bath

Rule 9 - Take a Building Block Approach

Be not afraid of growing slowly be afraid only of standing still

Chinese Proverb

Ever since Vasco de Balboa discovered the Pacific coast of Panama in 1513 kings and businessmen alikedreamed of the impossible to cut a waterway across the mountainous isthmus creating a shortcut between theAtlantic and Pacific Oceans Those dreams turned into reality during the Industrial Revolution It took almost

forty years of trial and error before the worlds greatest engineering feat since the Pyramids was completed in1914 Ships move through the locks of the canal rising 85 above sea level before they descend to the oppositeside Since its grand opening in 1920 the Panama Canal has revolutionized trans-oceanic traffic joining Eastand West Its 50-mile stretch saves every vessel about 8000 extra nautical miles of travel around the bottom tipof South America Several modifications have been engineered through the years to accommodate theincreasing size of ships

Data warehouses like the Panama Canal must be built over time and changed over time to meet new demandsA data warehouse must grow with the environment but the environment is unpredictable All sailors know thatthey cant direct the wind but that they can adjust their sails In comparison all data warehouse users know theycant direct the environment but they can adjust their warehouse Sometimes the data warehouse will grow

quickly and sometimes it will grow slowly but it should always be growing

So take a building block approach to data warehousing Teradata allows you to expand without boundaries -one building block at a time Plus adding on building blocks is easy

There are two aspects to a building block approach First you need to add applications to your data warehousein three to six month intervals Once the first application works then you are ready for more projects As you become more experienced with this approach you can add multiple projects in parallel by involving multipleorganizations

The second aspect of the building block approach is in the actual data warehouse architecture It doesnt matter

if yours is the smallest data warehouse in the world the largest or falls somewhere in between power andscalability always fuel success

Not long ago a customer flew out to San Diego for a Teradata demonstration and benchmark The benchmark ran late into the evening but the numbers were more than 50 better than the competition The customer wasextremely impressed but before buying he demanded to see the system scalability that everyone had beentalking about Although it was already late a Teradata employee was called in the middle of the night arrivedwithin 10 minutes (in pajamas) hooked up the building blocks and ran a utility called config She ran anothercalled reconfig and in less than two hours the system size doubled

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1458

As the environment changes in terms of users data complexity capacity batch windows time changes eventsor opportunities users should be able to continue building applications and architecture The more a Teradatasystem grows the more Teradata outshines the competition

Rule 10 - Buy a Teradata Data Warehouse

Men occasionally stumble over the truth but most of them pick themselves up and hurry off as if nothing hadhappened

Winston Churchill

Winston Churchill led Britain through World War II during what he called that countrys finest hour Whenusers see consistent data the system too is in its finest hour Teradata gives users the ability to ask questionsthey could never ask before Users trust Teradata because of its industry performance and reputation and because it never gives in Constant use gives users optimal business experience and no matter what a user asks the system responds with a hearty Yes Sir

When we explained Teradata to Churchill he said

A data WARe-house that consists of 250 Data marts is like poison and if I were the MIS departmentresponsible for maintaining them Id take it

Teradata guarantees an Enterprise Data Warehouse with no scalability issues Data loads like lightning andsystem administration is a breeze You can pick the performance level that meets your requirements for todayand forever The database can be normalized around detail data and because of Teradatas power users have theflexibility to ask any question at any time on any data

All other databases are suspect in data loading capabilities scalability reference sites decades of datawarehouse experience flexibility system administration difficulties and inability to handle the complex queriesof todays users These users are good

TeradatamdashThe Shining Star

Overview

Teradata has always been at the top of the data warehouse game even if the experts werent bright enough toknow it The incredible vision that the original designers had was tremendous It was so far to the left of geniusthat most thought the idea was impossible

Only he who attempts the ridiculous may achieve the impossible

Don Quixote

The Teradata database was originally designed in 1976 and many of the fundamental concepts still remaintoday Nearly 25 years later Teradata is still considered ahead of its time

In 1976 IBM mainframes dominated the computer business Everyone who was anyone had an IBMMainframe However the original founders of Teradata noticed that it took about 4 frac12 years for IBM to producea new mainframe They also noticed a little company called Intel Intel created a new PC chip every 2frac12 yearsWith mainframes moving forward every 4 frac12 years and PC chip every 2frac12 years Teradata recognized their

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 13: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1358

bull As my environment changes will the system be flexible enough to allow changes quickly and easilyDo the math

bull Will the system need so many Database Administrators (DBAs) that my systems cost skyrockets Dothe math

bull If we suddenly merged with another company and needed to incorporate into their mainframe or LANenvironment would the system be able to connect and include them Do the math

bull Can I continue to meet my batch window timeframes Do the mathbull Could I become the hero of the company one day only to have some technical glitch blamed on me

because of my poor foresight and be thrown out of the company into a giant mud puddle Do the bath

Rule 9 - Take a Building Block Approach

Be not afraid of growing slowly be afraid only of standing still

Chinese Proverb

Ever since Vasco de Balboa discovered the Pacific coast of Panama in 1513 kings and businessmen alikedreamed of the impossible to cut a waterway across the mountainous isthmus creating a shortcut between theAtlantic and Pacific Oceans Those dreams turned into reality during the Industrial Revolution It took almost

forty years of trial and error before the worlds greatest engineering feat since the Pyramids was completed in1914 Ships move through the locks of the canal rising 85 above sea level before they descend to the oppositeside Since its grand opening in 1920 the Panama Canal has revolutionized trans-oceanic traffic joining Eastand West Its 50-mile stretch saves every vessel about 8000 extra nautical miles of travel around the bottom tipof South America Several modifications have been engineered through the years to accommodate theincreasing size of ships

Data warehouses like the Panama Canal must be built over time and changed over time to meet new demandsA data warehouse must grow with the environment but the environment is unpredictable All sailors know thatthey cant direct the wind but that they can adjust their sails In comparison all data warehouse users know theycant direct the environment but they can adjust their warehouse Sometimes the data warehouse will grow

quickly and sometimes it will grow slowly but it should always be growing

So take a building block approach to data warehousing Teradata allows you to expand without boundaries -one building block at a time Plus adding on building blocks is easy

There are two aspects to a building block approach First you need to add applications to your data warehousein three to six month intervals Once the first application works then you are ready for more projects As you become more experienced with this approach you can add multiple projects in parallel by involving multipleorganizations

The second aspect of the building block approach is in the actual data warehouse architecture It doesnt matter

if yours is the smallest data warehouse in the world the largest or falls somewhere in between power andscalability always fuel success

Not long ago a customer flew out to San Diego for a Teradata demonstration and benchmark The benchmark ran late into the evening but the numbers were more than 50 better than the competition The customer wasextremely impressed but before buying he demanded to see the system scalability that everyone had beentalking about Although it was already late a Teradata employee was called in the middle of the night arrivedwithin 10 minutes (in pajamas) hooked up the building blocks and ran a utility called config She ran anothercalled reconfig and in less than two hours the system size doubled

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1458

As the environment changes in terms of users data complexity capacity batch windows time changes eventsor opportunities users should be able to continue building applications and architecture The more a Teradatasystem grows the more Teradata outshines the competition

Rule 10 - Buy a Teradata Data Warehouse

Men occasionally stumble over the truth but most of them pick themselves up and hurry off as if nothing hadhappened

Winston Churchill

Winston Churchill led Britain through World War II during what he called that countrys finest hour Whenusers see consistent data the system too is in its finest hour Teradata gives users the ability to ask questionsthey could never ask before Users trust Teradata because of its industry performance and reputation and because it never gives in Constant use gives users optimal business experience and no matter what a user asks the system responds with a hearty Yes Sir

When we explained Teradata to Churchill he said

A data WARe-house that consists of 250 Data marts is like poison and if I were the MIS departmentresponsible for maintaining them Id take it

Teradata guarantees an Enterprise Data Warehouse with no scalability issues Data loads like lightning andsystem administration is a breeze You can pick the performance level that meets your requirements for todayand forever The database can be normalized around detail data and because of Teradatas power users have theflexibility to ask any question at any time on any data

All other databases are suspect in data loading capabilities scalability reference sites decades of datawarehouse experience flexibility system administration difficulties and inability to handle the complex queriesof todays users These users are good

TeradatamdashThe Shining Star

Overview

Teradata has always been at the top of the data warehouse game even if the experts werent bright enough toknow it The incredible vision that the original designers had was tremendous It was so far to the left of geniusthat most thought the idea was impossible

Only he who attempts the ridiculous may achieve the impossible

Don Quixote

The Teradata database was originally designed in 1976 and many of the fundamental concepts still remaintoday Nearly 25 years later Teradata is still considered ahead of its time

In 1976 IBM mainframes dominated the computer business Everyone who was anyone had an IBMMainframe However the original founders of Teradata noticed that it took about 4 frac12 years for IBM to producea new mainframe They also noticed a little company called Intel Intel created a new PC chip every 2frac12 yearsWith mainframes moving forward every 4 frac12 years and PC chip every 2frac12 years Teradata recognized their

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 14: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1458

As the environment changes in terms of users data complexity capacity batch windows time changes eventsor opportunities users should be able to continue building applications and architecture The more a Teradatasystem grows the more Teradata outshines the competition

Rule 10 - Buy a Teradata Data Warehouse

Men occasionally stumble over the truth but most of them pick themselves up and hurry off as if nothing hadhappened

Winston Churchill

Winston Churchill led Britain through World War II during what he called that countrys finest hour Whenusers see consistent data the system too is in its finest hour Teradata gives users the ability to ask questionsthey could never ask before Users trust Teradata because of its industry performance and reputation and because it never gives in Constant use gives users optimal business experience and no matter what a user asks the system responds with a hearty Yes Sir

When we explained Teradata to Churchill he said

A data WARe-house that consists of 250 Data marts is like poison and if I were the MIS departmentresponsible for maintaining them Id take it

Teradata guarantees an Enterprise Data Warehouse with no scalability issues Data loads like lightning andsystem administration is a breeze You can pick the performance level that meets your requirements for todayand forever The database can be normalized around detail data and because of Teradatas power users have theflexibility to ask any question at any time on any data

All other databases are suspect in data loading capabilities scalability reference sites decades of datawarehouse experience flexibility system administration difficulties and inability to handle the complex queriesof todays users These users are good

TeradatamdashThe Shining Star

Overview

Teradata has always been at the top of the data warehouse game even if the experts werent bright enough toknow it The incredible vision that the original designers had was tremendous It was so far to the left of geniusthat most thought the idea was impossible

Only he who attempts the ridiculous may achieve the impossible

Don Quixote

The Teradata database was originally designed in 1976 and many of the fundamental concepts still remaintoday Nearly 25 years later Teradata is still considered ahead of its time

In 1976 IBM mainframes dominated the computer business Everyone who was anyone had an IBMMainframe However the original founders of Teradata noticed that it took about 4 frac12 years for IBM to producea new mainframe They also noticed a little company called Intel Intel created a new PC chip every 2frac12 yearsWith mainframes moving forward every 4 frac12 years and PC chip every 2frac12 years Teradata recognized their

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 15: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1558

vision to network enough PC chips together that the mainframe would be overpowered yet costs would behundreds of times cheaper than a mainframe The Teradata team estimated the power surge would come in1990

IBM laughed out loud They said Lets get this straighthellip you are going to network a bunch of PC chipstogether and overpower our mainframes Thats like plowing a field with a 1000 chickens In fact IBMsalespeople are still trying to dismiss Teradata as just a bunch of PCs in a cabinet

Teradata was convinced it could produce a product that would power large amounts of data and achieve theimpossible using PC technology in mainframe territory Its founders agreed with Napoleon Bonaparte whoasserted The word lsquoimpossiblersquo is not in my dictionary Sure enough when we looked in his dictionary thatword was not there And it is not in Teradatas Data Dictionary either The Teradata team set two goals build adatabase that could

bull Perform parallel processing andbull Accommodate a Terabyte of data

Driving in the car one evening Morgans eight-year old daughter Kara piped up from the back seat Daddy canyou buy Teradata in the store I mean what does Teradata really do Morgan thought for a moment and then

replied Do you remember when you went on the Easter egg hunt last spring Well imagine that we had fiftyeggs and you were the only child there If I asked you to find all the purple eggs would you be able to do thatKara said Sure But it might take me a long time Morgan continued What if we now let fifty children go inand I asked them to show me all of the purple eggs How long would that take His daughter responded Itwouldnt take any time at all because each child would only have to look at one egg That is precisely howTeradata works It divides up huge tasks among its processors and tackles each portion simultaneously withamazing speed And it doesnt matter if you have a trillion eggs in your basket

In 1984 the DBC1012 was introduced Since then Teradata has been the dominant force in data warehousingTeradata got the chickens plowing and is considered outstanding Meanwhile IBMs plow is out rusting in itsfield

Parallel Processing

An invasion of armies can be resisted but not an idea whose time has come

Victor Hugo

The idea of parallel processing gives Teradata the ability to have unlimited users unlimited power andunlimited scalability This is an idea whose time has come And it all starts with something called parallel processing So what is parallel processing Let us explain

It was 10 pm on a Saturday night and two friends were having dinner and drinks One of the friends looked athis watch and said I have to get going The other friend responded Whats the hurry His friend went on totell him that he had to leave to do his laundry at the Laundromatrdquo The other friend could not believe his earsHe responded What Youre leaving to do your laundry on a Saturday night Do it tomorrow His buddywent on to explain that there were only 10 washing machines at the laundry If I wait until tomorrow it will becrowded and I will be lucky to get one washing machine I have 10 loads of laundry so I will be there all day IfI go now there will be nobody there and I can do all 10 loads at the same time Ill be done in less than an hour and a half

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 16: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1658

This story describes what we call Parallel Processing Teradata is the only database in the world that loadsdata backs-up data and processes data in parallel Teradata was born to be parallel and instead of allowing just10 loads of wash to be done simultaneously Teradata allows for hundredshellip even thousands of loads to be donesimultaneously Teradata users may not be washing clothes but this is the technology that has been cleaningevery databases clock in performance tests

After enlightenment the laundry

Zen Proverb

After parallel processing the laundry enlightenment

Teradata Zen Proverb

With the computer world seeing Terabytes of data hundreds to thousands of users are asking a wide variety of complex questions and need instantaneous access to data In short this is the technology needed in a datawarehouse environment What we find most fascinating is that Teradata has unlimited power and growswithout boundaries and was born out of the PC (personal computer) world by people with vision

Components of a Personal Computer

A ship in harbor is safe but thats not why ships are built

John Shedd

In 1805 the pivotal Battle of Trafalgar matched Britains flotilla of battle ships against the almighty SpanishArmada Spain had huge battleships some having four tiered decks of canons But Britains Admiral Horatio Nelson used two lines of ships to sail circles around the Armada attacking them at their most vulnerable pointthe stern That battle paralyzed the Armada and turned the world of naval warfare upside down Teradatastunned the data-warehousing world by taking personal computer technology right into the mighty mainframe-dominated environment and beating them on their own turf Armed with a lightweight technology built onIntel processor chips memory a hard drive and an operating system Teradata achieved the unthinkablelightning-fast processing speed managing terabytes of data

A Personal Computer (PC) is made up of the following components

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 17: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1758

Processor Chip ndash This is the brain of the computer All tasks are done at the direction of the processor

Memory ndash This is the hand of the computer The memory allows data to be viewed manipulated changed or altered Data is brought in from the hard drive and the processor works with the data in memory Once changesare made in memory the processor can command that the information be written back to disk

Hard Drive ndash This is the spine of the computer The hard drive stores data applications and the OperatingSystem inside the PC The hard drive also called the disk drive holds the contents of the data for the system on

its disk

For example suppose you made three new good friends this month and want to add their names to your listOpening that document brings it up from the hard drive and displays it on your screen As you type in the newnames the processor executes your request onto the document while it is still being displayed in memory Uponcompletion you close the document and the processor writes all the changes to the disk where it is stored

In the picture below we see the basic components of a Personal Computer Note that it also holds a file calledBest_Friends listing and lists eight best friendsrdquo

Teradata Spreads Data over Multiple Processors

I dont mind starting the season with unknowns I just dont like finishing the season with them

Coach Lou Holtz

With Teradata you will never finish with any unknowns about your business you can know it all One reasonwhy this assertion is true can be found by looking at the unique way this database places the data into thesystem and processes it Teradata takes every table in the system and spreads the data across multiple processors Each processor works on its portion of the database in parallel when requested to do so This is why

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 18: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1858

we call it parallel processing In the previous example one processor listed eight best friends on its disk In thatcase Teradata would read eight rows

The Teradata example on the next page shows two processors each having direct access to its own physicaldisk The Best_Friends table has been spread out evenly across both processors When we ask for a list of bestfriends the system both processors will receive data in parallel and will return combined results over theconnecting network Returns for this example could easily double the speed of the previous example

Even though we still need to read eight records each processor is only responsible for reading four records andsimultaneously the other processor reads the remaining four records So how could we double the speed of thissystem again

Teradata has Linear Scalability

Every ceiling when reached becomes a floor upon which one walks and now can see a new ceiling

Tom Stoppard

There is no ceiling on the Teradata databases ability to grow Any time you want to double the speed simplydouble the number of processors This is called Linear Scalability This allows unlimited growth withminimal effects on response time Each time a new processor is added in Teradata a new storage disk is alsoadded By doing so the system can continually grow and there are no worries about the disk becoming the bottleneck of data

Notice in the system below there are four processors and that each is assigned two rows of data When we ask for our Best_Friends the system will read all eight rows Since data is spread evenly over four processorsTeradata reads two rows simultaneously across four processors Now the system is four times faster

Most data warehouses have tables that hold millions even billions of rows Teradata allows you to decide howmany processors are needed to get the desired response time This is called the Divide and Conquer theoryTo accommodate desired response rates some customers have thousands of processors Tasks are divided up between the AMPs and processed in parallel

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 19: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 1958

A Logical View of the Teradata Architecture

You are either making history or you are history

Leonard Sweet

A frustrated choral director was preparing for a concert then suddenly stopped and said Ive got to tell you

eight years ago I was directing another choir in this anthem and they made the same mistake youre makingHe continued Do any of you have a clue as to what the mistake is Just then a voice from the choir called outSame director

Many data warehouse environments have an architecture that is not designed for Decision Support yetcompany officials wonder why their data warehouse failed when they actually never had a chance to succeedIn ancient days Solomon wrote Where there is no vision the people perish It is no different today Companyleaders must cast a new vision that enables Decision Support with technology that can handle it or their companies too will be history

The following picture shows a logical view of Teradata The illustration shows a proper architecture for a data

warehouse In the example a user logs on to Teradata from a LAN or mainframe host and then is given asession with a Parsing Engine processor (PE) The user then asks a specific query using SQL

The PE checks SQL syntax then checks to see if the user has proper rights (authority) to access the table Nextthe PE creates a plan for the Access Module Processors (AMPs) to execute The PE passes the plan to theAMPs over the BYNET The AMPs obtain information on their disks then pass it to the PE over the BYNETThe PE then passes the data back to the user

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 20: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2058

Parsing Engine (PE)

Even a stopped clock is right twice a day

Polish Proverb

A man and his son were riding a bicycle built for two when they came to a steep hill It took a great deal of struggle for them to complete what proved to be a very steep climb When they got to the top the father in front

said Boy that sure was a hard climb His son in the back responded Yes it was Dad And if I hadnt keptthe brakes on all the way we would have rolled down backwards Teradata has an ingenious way to keep thistype of situation from happening inside the data warehouse Most databases make educated guesses about the best way to retrieve data The Teradata PE or Optimizer has both the experience and design to KNOW the best way to retrieve data

When users log-on to Teradata they are connecting to a Parsing Engine (PE) When a user submits a query thenthe PE takes action The PE creates a PLAN that tells the AMPs exactly what to do in order to get the data ThePE knows how many AMPs are in the system how many rows are in the table and the best way to get to thedata Teradatas PE has been continually enhanced since 1984 It has such a great reputation for speeding updata access that it has earned the name The OPTIMIZER

The PE loves to serve valid Teradata users but it was raised like a guard dog A good guard dog loves itsfamily but it barks and may bite when strangers approach The PE will always check users security (access)rights to ensure the user has the proper authority to obtain the information that is being requested If the user hasauthority the PE instructs the AMPs to get the data If the user doesnt have proper access rights the query isrejected

The PE doesnt like to brag but it did graduate at the top of its class Customers like Wal-Mart Anthem BlueCross and Blue Shield Bank of America ATampT and SouthWestern Bell have continually pushed the datawarehouse envelope This has given the PE years of experience in guiding AMPs to answer complex questions ndash some of which have never been asked before in their respective industries This experience allows users to ask

any question regardless of its complexity The PE isnt called The Optimizer for nothing It needs no tuning bya Database Administrator (DBA) or hints from the user Teradata users ask the questions and Teradata returnsthe answers

Access Module Processor (AMP)

Wise men talk because they have something to say fools talk because they have to say something

Plato

Two men decided to go ice fishing They found a good spot on some ice and began digging As soon as they

finished the hole they heard a voice from above saying There are no fish here Taking that as a sign theymoved about thirty feet and began digging again A second time they heard the voice saying There are no fishhere So they moved another thirty feet and began to dig a third hole This time the impatient voice spoke fromabove There are no fish here in this ice skating rink Some people just dont listen But this is never the casewith Teradatas Access Module Processors

The Access Module Processor (AMP) is a processor of little words It keeps its mouth shut and its ears openEach AMP listens to the PE via the BYNET network for instructions Each AMP retrieves data from its disk or writes data to its disk The AMP is the worker bee of the system It is the perfect employee It never complains

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 21: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2158

rarely calls in sick and lives to take direction from its boss the Parsing Engine (PE) The best example is tothink of each AMP as a computer processor attached to its own disk

Every AMP has its own disk and its the only AMP allowed to read or write data to that disk This action isreferred to as a Shared-Nothing architecture Although AMPs are the perfect workers they are not the perfect playmates Even as children AMPs would never share toys with other AMPs on the playground Each AMP hasits own disk and it shares this with no other AMP hence a Shared-Nothing architecture

Teradata spreads the rows of a table evenly across all AMPs in the system When the PE asks the AMPs to getthe data each AMP will read the rows only on their particular disk If this is done simultaneously all AMPsshould finish at about the same time As a matter of fact when we explained this philosophy to Confucius hestated A query is only as fast as the slowest AMP Confucius however did say not to quote him

Again an AMPs job is to read and write data to its disk The AMP takes its direction from the Parsing Engine(PE) The number of AMPs varies per system Today some Teradata systems have just four AMPs whileothers have more than 2000

The BYNET

Even if youre on the right track youll still get run over if you just sit there

Will Rogers

The BYNET ensures communication between AMPs and PEs is on the right track and that it happens rapidlyWhen communication between AMPs and PEs is necessary the BYNET operates as a communicationsuperhighway

There are always two BYNETs per system They are called BYNET 0 and BYNET 1 The duplication isinsurance in case one BYNET fails and it also enhances performance As an example think of two BYNETs astwo telephone lines in your home AMPs and PEPs can talk to one another over either BYNET or over both

Morgan Jones co-author has been talking to his four-year old son David about AMPs PEs and the BYNETLittle David asked Daddy what happens when the AMPs and PEs get lonely Morgan replied They talk toeach other over the BYNET

Here are the steps that outline exactly how the AMPs PEs and BYNETs work together A user performs aLOGON to Teradata A PE is assigned to manage all SQL for that particular user The user then asks Teradata aquestion Next

bull The PE checks the users SQL Syntaxbull The PE checks the users security rightsbull The PE comes up with a plan for the AMPs to followbull The PE passes the plan along to the AMPs over the BYNET bull The AMPs follow the plan and retrieve the data requestedbull The AMPs pass the data to the PE over the BYNET and bull The PE then passes the final data to the user

Teradata Building Block Approach

Better a diamond with a flaw than a pebble without one

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 22: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2258

Anonymous

Teradata builds its data warehouses in building blocks called nodes Each building block is a gem composedof four Intel processors Each node is connected flawlessly to other nodes through two BYNETs The AMPsand PEs reside inside the nodes memory Each node is connected to a disk array where each AMP has directaccess to one virtual disk

Below is a picture of a Teradata system It has four Intel processors and the AMPs and PEs reside in memory

Each AMP is directly attached to its one virtual disk

The following picture shows two nodes connected together over the BYNETs

Teradata Tables

Nearly everyone takes the limits of his own vision for the limits of the world A few do not Join them

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 23: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2358

Arthur Schopenhauer

Do you have one of those notoriously messy junk drawers in your kitchen You know the one were talkingabout hellip the one next to the silverware drawer This drawer may often contain old washer and dryer warrantiesmatches half-used flashlight batteries straws odd nuts bolts and washers corncob holders etc Fortunatelythe dresser drawers in your bedroom are typically much more organized In fact you probably store your clothing in those drawers much more neatly so you can get to what you need quickly

Relational databases store data much like we organize our dresser drawers Just as you might put all of your t-shirts in one drawer and your socks in another the database will store data about one topic in one table and datathat pertains to another topic is kept in another table For example a database might contain a CustomerTablecontaining items to track such as customer number CustomerName city and order number Another table theOrderTable might hold data like Order Number Order Date CustomerName Item No and Quantity

An example of each table follows

CUSTOMER TABLE called CustomerTable

CustomerID (PK) CustomerName CityName Order Number (FK) Customer Rep

1001 JC Penney Dallas 105372 Dreyer 1002 Office Depot Columbia 105799 Crocker

1003 Dillards Atlanta 106227 Smith

ORDER TABLE called OrderTable

Order Number (PK) Order Date Item No Quantity Customer ID

(FK)

105372 03072001 212 20 1001

105799 04182001 296 52 1002

106227 10172001 325 17 1003

The data stored in the CustomerTable is logically related to the data stored in the Order Table The two tables both have columns called Order Number These tables make up an extended family joined by the marriageof the columns named Order Number in each table

Earlier programming languages referred to files records and fields Relational databases use the termsTables ldquoRows and Columns Each Row of a table is comprised of one or more fields identified by acolumn name A Row is the smallest value that can be inserted into a table A column is the smallest valuewithin a table that can be updated or modified The data value stored in each column must match the data typefor that column For example you cannot enter the name of a city in a column that is defined as a decimal datatype Columns that are defined but have no data value will display a null or are sometimes represented by a

One column or combination of columns in each table is chosen to be the Primary Key (PK) This is alogical modeling term The primary key contains a unique value for each row and enforces the uniqueness of that row The PK cannot be null and should contain values that will not change In the CustomerTable the primary key is the CustomerID column Each customer has a unique CustomerID The data in the columns of every row must be consistent with the unique CustomerID for that row The rows in a table need not be storedin any particular order This is also called being arbitrary or an unordered set Before the table is definedthe order of the columns is also arbitrary It doesnt matter if you place CustomerName before CityName or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 24: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2458

after it However once the table is created the order of the columns (eg the row format for the table) mustremain the same Plus you cannot have multiple row formats within a table

What forms the relationship between the tables in a relational database A key that is common to each tableforms it A Foreign Key (FK) is a key in a table that is a Primary Key (PK) in another table The PK and FK relationship allows the two tables to relate to one another When you need to display data from more than onetable you can JOIN the two tables by matching a common key between the two tables A great choice is tomatch the primary key of one table to the foreign key of the other table Remember that a table may have only

one PK but it may have multiple FKs

Here is a quick reference chart for Primary and Foreign Keys

PRIMARY KEY FOREIGN KEY

Not optional Optional

Comprised of one or more columns Comprised of one or more columns

Can only have one PK per table Can have multiple FKs per table

No duplicates allowed Duplicates allowed

No changes allowed Changes allowed No nulls allowed Nulls allowed

Teradata Spreads the Data Evenly Across the AMPs

A chain is only as strong as its weakest link

Because Teradata spreads data evenly no AMP or disk is ever the weakest link Teradata is the only databasethat strings hundreds and thousands of processors together to achieve awesome processing power for todaysdata warehouses Today the AMPs (Access Module Processors) are software processors that reside inmemory Teradata always attempts to spread data evenly so each AMP will manage approximately the sameamount of data As a result the rows of every table are distributed across all of the AMPs In other words everyAMP stores a portion of every table in the database on its virtual disk (VDISK) If a data warehouse has 200tables then each AMP will hold a portion of 200 tables This method of data distribution is unique to Teradata

There are some significant benefits to handling data this way

First when each AMP has nearly the same quantity of table rows then no one AMP becomes a data bottleneck AMPs can all retrieve their portion of the data in parallel so you do not have AMPs sitting idle while one or twoothers are chugging away Baseball phenomenon Casey Stengel once said Its easy to get good players Gettinem to play together thats the hard part AMPs love to work together in parallel

Second each AMP is unaware of any data except its own portion The only AMP that can read or write to a particular row of data is the AMP that actually owns that row This makes retrieving data from a particular rowvery efficient as all AMPs do their own work

Third each AMP automatically groups all of its rows by the tables from which they come Have you ever beento a large aquarium and seen one of the displays that look like a very tall clear cylinder As you walk aroundthe glass the fish tend to swim in schools Similarly Teradata does this with the rows on the AMPs to boost performance When you ask for data from any given table an AMP will immediately go to that particular groupof rows and then select what you need It doesnt need to look through the rows of many tables before it findswhat you need This is how parallel processing works The AMPs retrieve data in parallel then pass it over the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 25: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2558

BYNET to the Parsing Engine (PE) and the PE ensures the data is delivered to the user Keep in mind theBynet is an internal Teradata network over which the PEs and the AMPs communicate

The example below shows the information we have just discussed Notice that the system has four AMPs andthree tables Employee Customer and Order Notice each AMP holds a portion of the rows for everytable AMP1 for example holds 14th of the Employee table rows 14th of the Customer table rows and 14th ofthe Order table rows

Plus the data is spread evenly for all tables If a query asks for all rows in the Customer Table then each AMPwill retrieve their Customer table rows in parallel with the other AMPs Each AMP will then pass its data to thePE via the BYNET Because the data in the Customer table is spread evenly among all AMPs each shouldfinish reading at exactly the same time

Also notice how each AMP separates each table Just like schools of fish the rows of the Employee Table aregrouped together In addition the Customer and Order tables are grouped together This is important in a datawarehouse environment because most queries read millions of rows to satisfy a single query Performance isenhanced when table rows are grouped together and Teradata is permitted to bring blocks of rows into memory

Primary Indexes

Every road has two directions

Russian Proverb

When world-renowned explorer Dr David Livingstone was working in Africa a group of friends wrote to himsaying We would like to send other men to you Have you found a good road into your area yet Accordingto a member of his family Dr Livingstone sent this message in response If you have men who will only comeif there is a good road I dont want them I want men who will come if there is no road at all

Although it doesnt have to cut its way through the dense African jungle the PRIMARY INDEX (PI) is thetrailblazer in Teradata that paves the way for the rest of the data to follow The PI is so important to Teradatafunctionality that every table in the database is required to have one As the quote above states Every road hastwo directions The Primary Index is used in two directions

1 The Primary Index WILL DETERMINE which rows go to which AMPs and 2 The Primary Index is ALWAYS the FASTEST RETRIEVAL method

If the user doesnt define a PRIMARY INDEX when creating a table the system will automatically choose one by default Once it is defined the PI column cannot be dropped or changed The table would need to be re-created in order to change the PI

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 26: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2658

There are two types of Primary Indexes

A man who chases two rabbits catches none

Roman Proverb

A man who chases two rabbits misses both by a HARE A person who chases two Primary Indexes misses both by an ERR

Tera-Tom Coffing

Each table may only have one Primary Index but every table must have a Primary Index defined It is either anUPI or a NUPI in other words a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI) ThePrimary Index is created when the table is created An example of creating a Unique Primary Index on thecolumn EMP follows

CREATE Table employee(emp INTEGER

dept INTEGERlname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)

hire_date DATE)UNIQUE PRIMARY INDEX(emp)

An example of creating a Non-Unique Primary Index is listed below Notice you never see the prefix NON

CREATE Table TomCemployee

(emp INTEGERdept INTEGERlname CHAR(20)

fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE

)PRIMARY INDEX(dept)

PRIMARY INDEXES may be defined on one column or on a set of columns viewed as a composite unit Up to16 columns may be defined as a Primary Index An example of creating a multi-column Unique Primary Indexfollows

CREATE Table employee(emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)

salary DECIMAL(102)hire_date DATE)

UNIQUE PRIMARY INDEX(emp dept)

Being related hardly insures relatability

Michael E Angier

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 27: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2758

All of the tables in a Teradata database are related to each other But the Primary Key and Primary Index ensuretheir relatability in day-to-day use What is the difference between a PRIMARY KEY and a PRIMARYINDEX A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in atable PKs determine relationships among tables A Primary Index is a physical term used to label column(s)that is used to store and locate rows of data

To illustrate imagine a library The Primary Key the logical is like the actual construction of the library Doyou know what part of the library is reserved for fiction What about for non-fiction Plus where will the card

catalog reside Once the library is logically correct it is ready to receive books A Primary Key on a table helpsto logically determine what data to track in the table

The Primary Index is much like a card catalog in the library Inside the card catalog drawers are thousands of index cards that provide the books title author publisher and the Dewey Decimal number By taking thatindex card you can immediately find where that book is shelved within the library The Primary Index columnvalue for a Teradata table tells where the row should reside Its also the fastest mechanism to retrieve data

Teradata uses the Primary Index to distribute each tables rows to the proper AMPs Teradata also uses thePrimary Index to retrieve rows at lightning speed

Exactly how does Teradata actually accomplish this Well Im glad you asked Lets look at the HASH MAPnext

The Hash Map

The map is not the territory

Alfred Korzybski

The first map of all the known lands in the world has been attributed to the Greek philosopher Anaximander of Miletus (610-ca546 BC) He may have been the first person to attempt such a map although others had drawn

local maps before The Hash Map was created by a group of individuals so Teradata could maximize its parallel processing roots Its the hash map that tells which AMP holds a particular row It does not contain anydata rows it just shows where to find them Overall the idea of the hash map is to spread the data as equally as possible

Once a travel agent received a call from a man asking Is it possible to see England from Canada The agentsaid No The man replied But they look so close on the map

Teradata uses a map called the HASH MAP in combination with the PRIMARY INDEX to distribute datarows The HASH MAP is not a two-dimensional array although it appears that way in diagrams It is more likea honeycomb with myriad buckets But while the honeycomb holds honey in its buckets the HASH MAP

buckets contain just one thing ndash the number of an AMP All AMPs and PEPs use the very same HASH MAP

The picture on the following page shows the hash map for a four-amp system This is shown for simulation purposes The actual hash map has 65536 buckets On the diagram notice that inside each bucket is an AMPnumber and that AMP number goes 1 2 3 4 then starts over again Why Its because this is the hash map for a four-AMP system

Hash Map

1 2 3 4 1 2

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 28: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2858

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

The next diagram shows the hash map for an eight-AMP system As before this is for simulation purposes Notice that the AMP number for this hash map goes 1 2 3 4 5 6 7 8 and then starts over again WhyBecause this hash map is for an eight-AMP system

1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8 1 2 3 4 5 6

7 8 1 2 3 4

5 6 7 8 1 2

3 4 5 6 7 8

How the Hash Map and Primary Index Work Together

Choice not chance determines destiny

Anonymous

The choice made for the Primary Index determines the exact AMP destination for each row in a table It mustnot be left up to chance

Here is how the Hash Map and Primary Index work together When a table is being loaded with data the rowswill be spread among all AMPs The Hash Map determines the actual DESTINATION AMP for each row of the table

Destination is determined using the Whiz-Bang Formula (a secret NCR formula) First well explain thetheory and then we will invent our own Wiz-Bang Formula to show you how it works conceptually

Lets start with a table to load on our four-AMP system Imagine you have listed your eight best friends in atable called Best_Friends You have two columns in the table They are titled Friend_Number andFriend_name Weve chosen only even numbers for Friend_Num because our friends are so even temperedWe have also made the Friend_Num a Unique Primary Index (UPI) on the table

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 29: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 2958

Best_Friends Table

Friend_Num Friend_Name

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

For this example Teradata will attempt to spread the table rows among the four-AMP system A picture of thefour-AMP configuration follows

Since there is a four-AMP configuration the system will use a four-AMP hash map Here is an illustration

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Instead of trying to figure out the NCR Wiz-Bang formula (a secret) we can show you the theory of distributingdata and retrieving data with our own formula It is called the

CoffingJones Wiz-Bang formula Take a tables Primary Index and divide the column value by 2 The answer points to a hash map bucket and that bucket tells which AMP will hold the row

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 30: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3058

Lets take our first row and determine on which AMP it will reside Remember we will get the Primary Indexvalue of the row divide it by the CoffingJones Wiz-Bang formula (divide by 2) and the answer will point to a bucket in the hash map Inside that bucket will be the AMP number in which the row will reside Lets take our first row and determine its proper location

Friend_Num Friend_Name

2 Bill Hon

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (2) by theCoffingJones Wiz-Bang Formula (divide by 2)

2 divided by 2 = 1

The hash map bucket number is one Lets check the hash map to see bucket number 1 and to see what AMPnumber is inside that bucket As seen in the picture below the first bucket in the hash map says the rowsdestination is AMP 1

Lets look at another random row

Friend_Num Friend_Name

16 Lyn Jones

Since we designated Friend_Num as the Primary Index we merely divide the value of Friend_Num (16) by the

CoffingJones Wiz-Bang Formula (divide by 2) and the answer is

16 divided by 2 = 8

Thus the hash map bucket number is now eight Lets check our hash map to see bucket number eightdetermine which AMP number is inside that bucket As you can see below bucket eight in the hash map saysthe rows destination is AMP four

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 31: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3158

If we continue the process until all data is laid out the system would look like this

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 32: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3258

Best_Friends Table

Friend_Num Friend_Name

2 Ben Hon

4 Joe Davis

6 Mary Gray

8 John Davis

10 Don Roy

12 Sam Mills

14 Kyle Marx

16 Lyn Jones

HASH

MAP

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

1 2 3 4 1 2

3 4 1 2 3 4

Remember the Teradata hashing formula is a secret However the CoffingJones Whiz Bang Formula did notcrack the code The purpose is to show you how the hash map works in theory to distribute and locate rowsSimply you should understand that the formula is mathematical (similar to CoffingJones Whiz-Bang Formula)and it will be consistent When we divided Friend_Number two by two we got bucket one in the hash mapHowever if we ran the formula on this premise a million times we would still get the same results

If you always do what you always did youll always get what you always got

Verne Hill

In summary Teradata will always be able to find a row if it knows the Primary Index It can rerun the hashformula point to the bucket in the hash map and then retrieve the row from the correct AMP The Teradatahashing formula always does what it always did and always gets what it always got Since it always runs thesame formula it is consistent

Retrieving the Data

When Teradata needs to retrieve data the fastest and most efficient way is via the Primary Index An exampleof SQL showing how Teradata retrieves the data follows

SELECT Friend_Num Friend_NameFROM Best_Friends

WHERE Friend_Num = 8

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 33: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3358

The Parsing Engine understands that the user wants to have two columns titled Friend_Num andFriend_Name returned The PE gets excited when it notices that we are after Friend_Num eight It recognizesthat Friend_Num is the PRIMARY INDEX The PE then runs the hash formula for eight For explanation purposes the CoffingJones hash formula is used and merely divides the PI by two When the PE divides thevalue eight by two then it receives an answer of four It looks in bucket four and sees the AMP number The PE passes a plan to retrieve the data to ONLY AMP number four as this is a one AMP operation

The Full Table Scan

What matters is not the size of the dog in the fight but the size of the fight in the dog

Coach Bear Bryant

When we travel the globe teaching Teradata classes we often ask students Are Full Table Scans acceptable ina data warehouse

About 80 of the time students respond NO After we complete training they respond Heck YES

Tom told me that he wrestled his way through high school and college I said Really I didnt think the classeswere that difficult myself Actually Tom earned a wrestling scholarship to college and achieved the All-American level His wrestling coach drilled into the wrestlersrsquo minds that the size of the opponent is not to be

feared but the size of their will The truth is that most databases do not have the FIGHT in them to handle aFull Table Scan Thats why so many students are surprised at Teradatas abilities to actually handle Full TableScans

A Full Table Scan (FTS) is a query that reads every row of a table The table may be small or have billions of rows With Teradata a Full Table Scan (FTS) means every AMP reads only the rows it owns in parallel with allother AMPs in the system Doing so speeds up a Full Table Scan hundreds to thousands of times

For example imagine a table that has 100 rows in a system that has 10 AMPs Each AMP owns 10 rows On aFull Table Scan each AMP reads its 10 rows Next each AMP passes the information over the BYNET to thePEP This process is 10 times faster than most systems But what happens with systems that have hundreds or

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 34: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3458

even thousands of AMPS Well one major telecommunications company copied a 35 billion-row table in just18 minutes The 1900 AMPs in its system helped return results very rapidly Talk about efficiency

Most FTS bring traditional databases to their knees but Teradata was born to be parallel Teradata wasspecifically designed for data warehousing When you ask decision support questions like Who are my bestand worst customers then you are asking the system to read through an entire table Full Table Scans arefundamental and an important part of data warehousing They allow users to literally ask any question aboutany data at any time Teradata has the experience power and architecture to allow Full Table Scans

A an example of a query asking for a Full Table Scan is

SELECT Friend_Num Friend_NameFROM Best_Friends

In this example the Parsing Engine receives the SQL and checks the syntax and security If the user passesthese tests the query continues The PE knows this query asks to return all records This is a Full Table ScanTherefore it passes the AMPs a plan that says Retrieve all of your Best_Friends table rowsrdquo and then

pass them to me (PE) over the BYNET With that in mind

bull Each AMP reads the Best_Friends rows individually own bull Each AMP passes its rows to the PE over the BYNET

Lets run through the SQL again and see the result

SELECT Friend_Num Friend_Name

FROM Best_Friends

8 rows returned

Friend_Num Friend_Name

6 Mary Gray

14 Kyle Marx

8 John Davis

16 Lyn Jones

2 Ben Hon

10 Don Roy

4 Joe Davis

12 Sam Mills

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 35: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3558

In this chapter we have shown you two opposite approaches to retrieving data In our first query we used thePrimary Index to retrieve one row In the next query we used a Full Table Scan (FTS) to retrieve all the rowsOne approach is the fastest way and the other is the slowest way But are these the only options for retrievingdata No There is another option in a Secondary Index

Secondary Indexes

Measure a thousand times and cut once

Turkish Proverb

Secondary Indexes provide an alternate path to the data and should be used on queries that run thousands of times Teradata runs extremely well without secondary indexes but since secondary indexes use up space andoverhead they should only be used on KNOWN QUERIES or queries that are run over and over again Onceyou know the data warehouse environment you can create secondary indexes to enhance its performance

Measure a thousand query times and create a secondary index

Turkish Teradata Certified Professional

Furthermore there are two types of secondary indexes They are Unique Secondary Indexes (USI) and Non-Unique Secondary Indexes (NUSI) respectively referred to as USI and NUSI A table may have up to 32secondary indexes

The good news about secondary indexes is that they speed up queries The bad news is that every timesomeone creates a secondary index on a table Teradata creates and maintains a separate secondary index sub-table This action not only takes up space but also adds overhead

A classical secondary index is itself a table made up of rows having two main parts The first is the datacolumn inside the secondary index table and the second part is a pointer showing the location of the row in the

base table Teradata brilliantly uses the hash formula and the hash map to build its secondary index sub-tables

There are three values stored in every secondary index sub-table row

Secondary Index data value Secondary Index Row-ID (This is the hashed version of the value) Primary IndexRow-ID (This locates the AMP and the base row)

When a secondary index is created the Teradata PE tells each AMP to hash the secondary index column valuefor each of its rows It tells the PE to place the hash in a secondary index sub-table along with the ROW-ID that points to the base row where the desired value resides

Lets create a secondary index on our Best_friends table The syntax to create a secondary index on the columnFriend_Name in the table called Best_Friends is

CREATE UNIQUE INDEX(Friend_Name) on Best_Friends

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 36: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3658

The example above shows the theory behind creating a secondary index There are four AMPs in this systemThe base table is the Best_Friends table seen near the top of the AMPs disk We created a Unique SecondaryIndex (USI) on Friend_Name and Teradata automatically created a secondary index sub-table on each AMP Next the AMPs hashed the secondary index values These values went to the AMP to which they hashedalong with a pointer to the base row

The design is simple for display purposes A symbol represents the base row-id For example Ben Hon who isFriend_Number 2 has a smiley-face for his symbol Notice that in the Secondary Index Sub-table (located at the

bottom of the AMPs disk) there is also a smiley face Here is how the design works for retrieval Lets look athow the following query plays out

SELECT Friend_Num Friend_Name

FROM Best_Friends WHERE Friend_Name = Ben Hon

The Teradata Parsing Engine takes the SQL and checks the syntax and security access rights If all is well thePE notices that in the WHERE clause of the query it is asking ldquoWHERE Friend_Name = lsquoBen Honrsquo The PErecognizes that Friend_name is a Unique Secondary Index The PE will hash Ben Hon and then use the hashmap to find the AMP that holds Ben Hon in its secondary index sub-table As you can see the AMP involvedis number two (notice the smiley face on AMP 2) The PE instructs AMP 2 to retrieve the Ben HonSecondary Index Sub-table Once complete Teradata can see the real row-id and find the base row In our example once the lsquoBen Honrsquo Secondary Index Sub-table row is found the row-id (smiley face in thisexample) is revealed and the PE can find the matching smiley face in the base table

This approach allows all USI requests in the WHERE clause of SQL to become two-AMP operations

A NUSI used in the WHERE clause still requires all AMPs but the AMPs can easily check the secondary indexsub-table to see if they have one or more qualifying rows

Create secondary indexes only on columns used repeatedly in the WHERE clause of on-going queriesSecondary indexes take up space and overhead but boy can they speed up queries

Join Indexes

A bend in the road is not the end of the road unless you fail to make the turn

A join is an SQL query that gathers its information from more than one table Teradata can join up to 64 tablesin a single query Many databases cant handle join processing so either the database is modeled in adimensional fashion or summary tables are created Teradata allows you to travel down a faster and straighter highway

Because data marts or summary tables can be an administrative nightmare Teradata enables join access withoutrequiring physical data marts This is accomplished by creating a join index When you create a join index thetables involved are pre-joined There is an actual table built containing the joined data The users dont everyquery the join index They run their normal joins and the PE will check to see if the join can be satisfied by theJoin Index table If it can Teradata will pull the data from the Join Index table Best of all once it is created thedata is automatically maintained as the underlying base tables are updated

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 37: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3758

Teradata Databases Users and Space

Overview

Choose a job you like and you will never have to work a day of your life

Confucius

When a Teradata system arrives at your doorstep it has been carefully configured to provide adequate permanent disk space that will store manage and back-up your companys data All of the space that comeswith the system belongs to the user called DBC DBC loves its job because it is the top dog Every Teradatasystem that was ever built has a user called DBC The acronym is derived from the first Teradata machine

called the DBC1012 DBC stands for Database Computer and 1012 stands for 10 to the 12th power ndash or aTerabyte There is no user with greater privileges than the DBC

The DBC owns all permanent space in a Teradata system It also contains system tables that hold informationabout the entire system These system tables are known as the Data DictionaryDirectory (DD)The DataDictionary acts like a Dictionary to users who want to look up system information and as a Directory to theParsing Engine (PE) The PE looks in the Directory for help with creating The Plan The Dictionary directsthe PE on topics such as security access rights table columns indexes macros views etc

So if your system comes with 100 Gigabytes of permanent disk space then the DBC owns 100 Gigabytes of PERM space Teradata is hierarchical in nature so it is up to DBC to dole out space to other databases or users

In the beginning the DBC owns all PERM space No space is unassigned As DBC begins to give space toother usersdatabases they take ownership Keep in mind all space is owned Space never goes unaccountedfor ndash if the space is not owned by DBC then its owned by someone under DBC

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 38: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3858

Logical Picture of System Space

The DBC is now ready to distribute space but because DBC is so powerful this can be dangerous What if theDBC user forgets the password What if a disgruntled employee knows the DBC password and is looking for revenge The DBC password must be protected and as a result many companies create a new user calledSYSDBA This user owns about 80 of the space while the DBC owns the remaining 20 that is allocated forthe Data Dictionary and the Transient Journal (see Data Protection chapter) The DBC password can then belocked in a safe and it is now up to the SYSDBA to distribute space

Logical Picture of System Space

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 39: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 3958

The SYSDBA now owns 80 of the system space The user does NOT have to be called SYSDBA It could becalled Morgan or Tom or anything SYSDBA however is a standard name that most systems utilize

As you can see in the following picture the DBC still owns about 20 of the total space The user SYSDBAhas given some space to a database called MRKT and to another one called SALES It has also given spaceto a user called Morgan

NOTE Morgan has given some of his space to Tom Therefore both Morgan and Tom can now own tables

Logical Picture of System Space

Remember either a database or a user can own space Whats the difference between a database and a user Thattopic follows

Databases and Users

Unlike other database products Teradata sees little difference between a user and a database Both need spaceto contain or own data In fact the only real difference is that a user has a password and he or she can log-onand submit SQL requests

Both a database and a user can own perm space therefore both can actually own tables

When we stated that relational databases are much like an extended family we were not kidding Below is adiagram showing a hierarchy of space ownership in TeradataAny user or database sitting anywhere aboveyou in the hierarchy is referred to as your parent or owner Any object below you is a child Your extended family will grow as you add users and databases

Take a look at the following diagram and then tell us who is the owner of Tom The answer is MorganSYSDBA and DBC Each of these items are listed above Tom in the hierarchy so each is a parent or ownerWith this hierarchy in effect parents (or owners) have the ability to GRANT or REVOKE rights from Tom

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 40: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4058

Three Types of Teradata Space

There are three types of space with Teradata They are

bull Perm Spacebull Spool Space andbull Temp Space

Perm space defines the upper limit of space that a database or user can use to hold tables secondary indexsub-tables and permanent journals (See protection features)

Spool space defines the upper limit of space that a user has to run a query When a user runs a query AMPs build the answer set in spool space Once the query is done the spool space is released If the query exceeds thespool spaces upper limit the query aborts Then the user is out of spool space

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tablesThese tables will be discussed in another chapter

The SYSDBA knows that tenaciously holding onto its space will not provide any value to your company A bank that holds onto all of its capital will not be successful or will it If its destined for success it will lend outits capital in the form of credit lines or mortgages These actions will provide the bank with a healthy profit TheSYSDBA likewise gladly gives up space to each new user or database in an effort to make the Teradata system profitable

SYSDBA gives out two kinds of space Perm space and Spool space When you receive a credit card from the

bank you are given an upper limit to your line of credit In order to spend more than that limit you must getapproval from the bank In the same way the SYSDBA gives a new user an upper limit of space to use Whenthat amount is used up the user must request an increase Another way to free up some space is to drop sometables from the database

Perm space is actually used to store real data such as tables views and macros If you give some of your permspace to a child object then you must subtract that same amount from the total perm space you own

Spool space is the area where AMPs temporarily place the answer to a query Once the answer is delivered tothe person making the query the AMPs release that spool space to be used for another query Unlike permspace spool space is not lost if it is given away You can actually give users below you as much spool as you

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 41: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4158

would like yet still have the original amount Spool is like a speed limit on the highway If your own speedlimit is 65 mph you can still allow every other driver to drive up to 65 mph Some users may not receive permspace if their job is just to run queries ndash not create tables These users will just receive spool

The following picture shows a logical view of a CustomerTable Note the table is stored in PERM space Whena user submits a query against this table the answer is stored temporarily in SPOOL When the query iscompleted the answer is delivered to the user and then the SPOOL is released

The next picture shows a logical Teradata system In the PERM area there is a table called Employee Thistable has five columns Emp Dept Lname Fname and Sal The table has four employees Notice the SQLstatement at the bottom of the picture is asking to see all columns where the employees department is equal to10 To complete the query the AMPs will read the rows of the table and each time they find a row where Deptis equal to 10 a row is added to spool Plus when the answer is returned the spool is released

What is a View

At Christmas time no one cares about the past or the future All that matters is the present One year my wifeand I were in New York City during the holiday season We had always heard about how wonderful the windowdisplays are in the large department stores As we window-shopped we got lots of ideas for gifts We could see products displayed in the windows but we could not actually touch them We only had a pleasant view Display

windows are designed to show shoppers what store management wants you to see In Teradata a view is like adepartment store window because you can see selected portions of a table yet you arent able to see sensitivedata Instead you can view data within your access rights and you determine what data portions you want othersto see

Views are real sticklers for protecting sensitive data from inquiring eyes For example the Human Resourcesdatabase might contain an employee table Management can create a view of the table that hides the salarycolumn yet still allows an administrative associate to view names phone numbers and department numbers of employees In this scenario the salary column is not shown As a result views are the best choice for protectingsensitive data

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 42: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4258

Another benefit of views is that their definitions are stored in the Data Dictionary When you select a view of atable(s) the data is not stored on the disks so it does not duplicate data and take up more space In this scenarioyou are looking at a filtered picture of the data

The Employee Table

Emp Dept Lname Fname Sal

1 10 Johnson Manny 100000

2 20 Carlsbad Jan 100000

22 30 Winter Steve 77000

25 10 Lester Bonnie 56000

33 10 Samuels Todd 120000

99 20 Walter Misha 104000

The previous table shows the employee table In nearly every company employees are curious about thesalaries of co-workers Providing access to the employee table above will actually allow users to see everyoneelses salary To avoid disclosing salary information a view should be created to limit certain columns and

rows

Its simple to create a view

CREATE VIEW EMPLOY_V ASSELECT Emp

DeptLname

FnameFROM EMPLOYEE

In the SQL statement above salary is not selected However if users are denied access the employee table but

are given access rights to the EMPLOY_V view there is enhanced security With this restriction no user canactually see the list of employee salaries

Perm Space is required to create a table but it is not needed to create a view The creation and definition of aview are both stored in the Data Dictionary and are monitored by the DBC However anyone can create aview provided that person has the proper privileges

Once a view has been created users can select data from the view An example is

SELECT FROM Employ_V

6 rows returned

Emp Dept Lname Fname

1 10 Johnson Manny

2 20 Carlsbad Jan

22 30 Winter Steve

25 10 Lester Bonnie

33 10 Samuels Todd

99 20 Walter Misha

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 43: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4358

What is a Macro

The axe soon forgets but the tree always remembers

Anonymous

When you run specific queries often or if you want to ensure you dont forget an SQL step you should use amacro The user sometimes forgets but the macro always remembers A macro is a group of one or more SQL

statements that are given a name and that are executed with a simple command If there are multiple commandsTeradata treats them as one single transaction In other words either they all work or none of them work Likeviews the definition statement for a macro is stored in the Data Dictionary

If your manager asks you for three reports he may want to know

bull What employees are in department 10bull What employees are in department 20 andbull A list of employee names sorted by last name

A macro can easily be created to run all three commands The syntax would be

CREATE MACRO Emp_mac AS(

SELECT from Employ_v WHERE dept = 10

SELECT from Employ_v WHERE dept = 20

SELECT FROM Employ_v Order by lname)

Once the macro has been created and stored in the Data Dictionary its time for a test run To run this macrothe user merely executes the SQL

Execute Emp_mac

Here is a handy reference chart that compares views with macros

Views Macros

bull We select from views bull We execute macros

bull Uses the keyword AS bull Uses the keyword AS

bull Definition is stored in the Data Dictionary bull Definition is stored in the DataDictionary

bull Accesses certain portions of the data bull Accesses the real data itself

bull Is changed using the keyword REPLACE bull Is changed using the keyword REPLACE

Access Rights for Teradata Users

Never insult seven men when all youre packing is a six gun

Wild West Slogan

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 44: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4458

I taught in one place that was so rough hellip security actually checked me for weapons When they found out that Ihad no weapons they gave me some

Actually on a recent consulting trip I was signed in each morning by a friendly security guard This customer site had tons of highly sensitive data As long as I stayed in my assigned work area the guard and I got along just fine However as soon as I needed to move to a different room someone had to accompany me and giveme access In Teradata the Parsing Engine is the vigilant guard who never lets someone get close to data if heor she doesnt have the right permissions

Every time an SQL request comes to the PE it checks the SQL syntax for validity first Its next step everysingle time is to see if the user has permission to perform a given operation on a specified Teradata object

Automatic Implicit and Explicit Rights

Teradata uses three types of privileges and records of these rights are stored in the DBC Owners or Parentshave Implicit Rights These rights allow the owners (parents) to grant and revoke privileges on any users listed below them in the hierarchy In real life parents have these privileges too Think about ithellip nearly everyteenager has heard the statement Im revoking your privilege to drive the family car until those grades comeup Hand over the keys

Explicit Rights are any privileges granted from someone else For example Tom might grant Mary permissionto create a table in his database even though Mary works in the marketing (MRKT) department

Automatic Rights are system assigned privileges When a new user or database is created it receives 16different access rights The creator of the new object gets 20 rights Similarly when a baby is born in the UnitedStates he or she is granted some basic rights by the US Constitution

In the picture above the DBC has Implicit rights on all databases and users Plus SYSDBA has Implicit rightson every person listed below him MRKT has explicit rights over Mary and Morgan has the same rights over Tom Implicit rights simply means it is implied that those people listed above you (in a hierarchy chart) canGRANT or REVOKE privileges on you

For example if Tom or Morgan decides to give certain privileges to Mary either person could EXPLICITLYgive her those permissions

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 45: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4558

In comparison Automatic Rights means when Morgan created Tom he automatically received 20 access rights(on Tom) plus Tom was given 16 access rights on himself

Data Protection

Overview

As a man was driving down the interstate highway his cell phone rang When he answered he heard his wifewarn him urgently George I just heard on the news that theres a car going the wrong way on I-26 Georgereplied Im on I-26 right now and its not just one car Its hundreds of them

How do you protect your data when things go the wrong way Murphys law states ldquoThe more mission criticala data warehouse the more likely the system will crash at the most critical moment of the mission Ironicallymost DBAs think Murphy was an optimist

Please sleep on it tonight and if you wake up in the morning let me know what you think

Morgans Life Insurance Agent

A database not prepared to defend itself is like an unsigned contract It is not worth the paper it is written onHowever Teradata is always prepared and it will protect your data better than a wild pit bull As a matter of fact the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go

System and user errors are inevitable in any large system For example an associate may accidentally giveeveryone a 100 raise instead of a 10 raise Or what if a million-dollar transaction fails right at the wrongtime Or an AMP or DISK goes down In any of these cases Teradata will have many ways to protect your data Some processes for protection are automatic and some of them are optional

The protection features we will discuss are

bull Transaction Conceptbull Transient Journalbull FALLBACK bull RAIDbull Clusteringbull Cliquesbull Permanent Journaling

Transaction Concept amp Transient Journal

The afternoon knows what the morning never suspected

Swedish Proverb

At any time something could go wrong with a transaction An old proverb suggests The afternoon often knowswhat the morning never suspected likewise the Transient Journal knows what the transaction never suspected

What good would it do if you could gather store and analyze terabytes of data but doubted the integrity of thedata Teradata makes every effort to ensure a database doesnt get corrupt Fundamental to this assurance is the

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 46: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4658

Transaction Concept which means that an SQL statement is viewed as a transaction Simply stated either itworks or it fails

The Transient Journals job is to ensure if things do fail then the rows affected can be reverted back to their original state In Teradata all SQL statements are considered transactions This applies whether you have onestatement or multiple statements executing (MACRO) If all SQL statements cannot be performed successfullythe following happens

bull The user receives immediate feedback in the form of a failure messagebull The entire transaction is rolled back and any changes made to the database are reversedbull Locks are released andbull Spool files are discarded

Wouldnt it be great if every time you got a haircut the barber or stylist took a picture of your hairdo beforethey cut a single strand Then after he or she cut your hair asked if you liked it If you didnt like it then youcould ask to have it restored Well that is what the Transaction Journal does If a row is going to change because of an INSERT UPDATE or DELETE it takes a BEFORE picture If the transaction fails then the journal restores it to the way it was

The TRANSIENT JOURNAL is an automatic system function It is not optional The BEFORE image isactually stored in the AMPsrdquo Transient Journalrdquo Every AMP has a transient journal that is maintained inDBCs PERM space If the transaction is aborted for any reason the AMP restores the data to match the before-image stored in the Transient Journal The data will then revert to its original state When a transaction issuccessful the PE and the AMPs shake hands on it and the Transient Journal is wiped clean The handshake iscalled the COMMIT After a COMMIT all the AMPS have a party to celebrate and the user is invited to joinin the festivities In other words Transaction Journal Cleanliness is next to Godliness If it is clean thenthings went good

FALLBACK Protection

I asked my dentist if I had to floss all my teeth and he responded No just the ones you want to keep

If youre not TRUE to your teeth theyll be FALSE to you

Morgans Dentist

FALLBACK is a table protection feature used in case an AMP fails You can use FALLBACK on all tablessome tables or no tables When I asked my dentist if I should use FALLBACK on all tables he responded No just the ones you want to keep running when an AMP fails

Below is the four-AMP system and the Best_Friends table In this example data is spread evenly and the

system is ready to run in parallel It is brilliant but vulnerable What happens if we lose AMP one We can nolonger get to the Best_Friends rows containing Ben Hon and Don Roy FALLBACK however will correctthis situation

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 47: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4758

In the picture below you can see the Best_Friends table and the FALLBACK protected rows

In this picture the BASE table Best_Friends is illustrated at the top of the disk and the FALLBACK rows are placed at the bottom of the disk If we lose AMP1 then we can get Ben Hon from AMP2 and Don Royfrom AMP4

Keep in mind FALLBACK tables use twice as much disk space as NON-FALLBACK rows In the pictureabove there were eight base rows in the Best_Friends table and eight rows in the Best_Friends FALLBACK rows With FALLBACK we can lose any AMP and still get to the data

You cant step into the same river twice

Heraclitus

The data in a companys database tables is constantly changing much like a flowing river As every footstepreally encounters a different river likewise each update really makes a different table That is why Fallback protection can be vital for mission critical tables It actually allows the user to step into the same table twice if necessary

If we can lose any one AMPdisk what happens if we lose two The chance of losing two AMPs in a four-AMPsystem is rare however some systems have nearly 2000 AMPs Therefore the chance of losing two AMPs in a2000 AMP system is much greater than in a four-AMP system Thats why Teradata designed Clustering Letslook at this next example with a little larger system

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 48: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4858

Lets discuss the picture above in detail This is an eight-AMP system Four AMPs are in Cluster one and four AMPs are in cluster two The base table Best_Friends (listed at the top of all disks) is spread evenly across alleight AMPs Taking the Primary Index and running it through the hashing algorithm complete this allocation Next the output of the hashing algorithm points to a bucket in the hash map and inside that bucket is the AMPnumber or the rows destination

Notice the FALLBACK rows in this example In the top cluster (cluster 1) FALLBACK rows are backups for the top clusters base rows In the bottom cluster (cluster 2) FALLBACK rows are backups for the bottomclusters base rows

With this protection WE CAN AFFORD TO LOSE ONE AMP IN EACH CLUSTER

The brilliance behind this protection is the Hash Map There is a Base Row Hash Map used to distribute the base rows Its called the Primary Hash Map There is also the Fallback Hash Map that knows exactly howAMPs are clustered and which AMP should host a FALLBACK row

In most systems AMPs are clustered in a group of four The next most popular clustering scheme is a group of three However the minimum number of AMPs per cluster is two but the maximum number of AMPs per cluster is 16 Lets look at the extremes of both clusters (two versus 16)

The advantage of clustering in groups of two is that both AMPs would have to fail before the system stoppedThe disadvantage is that if one AMP fails the other must do its work plus the work of the down AMP Withclustering in a group of two every complex query will take twice as long to process

The advantage to clustering in groups of 16 is that if one AMP fails there are 15 other AMPs doing their work and sharing in the work of the failed AMP The disadvantage to this type of clustering is there is an increasedrisk of losing two AMPs in the cluster

This is the reason four-AMP cluster configurations are so popular The chances of losing two AMPs out of four are quite low However if one AMP is lost the other three will share in the extra work

FALLBACK is an optional means of protection specified at the database or table level It may be requestedwhen the table is first created or you may add or drop FALLBACK at any time by using the ALTER TABLEcommand (For more information refer to Teradata SQL ndash Unleash the Power by Mike Larkins and TomCoffing)

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 49: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 4958

Lets review FALLBACK and clarify related issues When a new row is inserted into a table FALLBACK always places a second copy of that row on another AMP in the same group or cluster Keep in mind that acluster usually consists of four AMPs From that point on any manipulation of the data in the primary row alsohappens to the FALLBACK row FALLBACK rows are distributed evenly across all the AMPs within the samecluster If one AMP fails processing continues with all subsequent changes to that AMPs rows

FALLBACK provides an optional insurance policy for a failed AMP however there is a cost for that insuranceFALLBACK requires twice as much disk space to store both the primary and duplicate rows on a table Another

cost that should not be overlooked is twice the IO (InputOutput) applies to inserts updates and deletes becausethere are always two copies to write However because Teradata AMPs operate in parallel both rows are placed on their respective AMPs at nearly the same time

Although FALLBACK may be created on any all or no tables its extra cost causes most companies to use itonly for mission critical tables As you might suspect the Data Dictionary is automatically FALLBACK protected FALLBACK may not protect your system from all failures but it certainly is an excellent faulttolerant solution

Down AMP Recovery Journal (DARJ)

The blockbuster movie While You Were Sleeping starring Sandra Bullock told a fascinating love story Ayoung woman who collected tolls for the Chicago elevated train system fell in love with a man who boarded thetrain each day at her station However the man only knew her as the woman in the booth who collected his fareOne day the dashing young man tripped and fell into the path of the train Only quick action by the lovesick tolclerk kept him from certain death Although he avoided death he fell into a coma While he was in a coma themans family fell in love with the toll clerk who visited him in the hospital Because she visited so often themans family actual thought she was the mans fianceacutee The movie continues to tell how the man regainsconsciousness and the events the immediately follow At the end of the movie it turns out that all along thewoman had been telling the man everything that happened While you were sleeping

The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP

is not working Like the TRANSIENT JOURNAL the DARJ also known as the RECOVERY JOURNAL getsit space from the DBCs PERM space When an AMP fails the rest of the AMPs in its cluster initiate a DARJThe DARJ keeps track of any changes written to the failed AMP When the AMP comes back online the DARJwill catch-up the AMP on everything that occurred while it was sleeping Then the DARJ is discarded

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 50: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5058

In the previous picture there are two clusters but notice that AMP one has failed After failure the other AMPsin the top cluster open the Down AMP Recovery Journal (DARJ) Also none of the AMPs in the bottom clusterhave the DARJ open Why Simply because the FALLBACK rows for the down AMP are housed within thecluster If anything happens while the AMP is sleeping it has three extremely cute ticket takers that will storeall information pertaining to the down AMP

Redundant Array of Independent Disks (RAID)

I know that you believe that you understand what you think I said but I am not sure you realize that what youheard is not what I meant

Sign on Pentagon office wall

RAID never gets confused It always knows exactly what the disk said and it mirrors it exactly The disks in theDisk Array modules accessed by the AMPs are similar to a hard disk drive in a personal computer No doubtyou have heard people complain that their hard drive crashed Well disk drives crash inside modules that storemultiple disks too Redundant Array of Independent Disks (RAID) protects against a disk failure There aremany levels of RAID in the data storage industry The most common level and one that is used by Teradata isRAID-1 also called MIRRORING With RAID-1 each primary disk has a mirror image or an exact copy of all

its data on another disk The contents of both disks are identical

When data is written on the primary disk it is also written on the mirror disk However the dual-write processis invisible to the user This is the reason RAID-1 is also called transparent mirroring Mirrored disks providea high degree of reliability because when a disk fails no data is lost its actually fully accessible on the mirror disk Operations continue while the Disk Array Controller copies the data from the mirror disk to a replacement primary disk The down side of RAID-1 like FALLBACK is that it requires a 50 overhead of disk space

Mirroring has been typically provided at the application or operating system level Teradatas RAID solutionshowever manage mirroring at the Disk Array Controller level because it boosts performance The AMPs canread data from either the primary disk or its mirror Plus

the controller decides which readwrite assembly (drive actuator) is closest to the requested data

In the next example an AMP is shown with its virtual disk However this is conceptual In actuality each AMPhas four physical disks Since only the AMP illustrated can get to its information we like to explain thisconcept as a single virtual disk This concept is called a Shared Nothing environment However we can stillkeep the shared nothing environment and have four physical disks With that only the AMP actually owningthe virtual disk can access its four disks

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 51: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5158

In the picture above one AMP has one Virtual Disk but it also has four physical disks Plus each disk has amirror in case of the loss of a disk The four disks together form a Rank of Disks Two disks in a rank may belost so long as they are not comprised of a data disk and its mirror In this example the data from theBest_Friends table is displayed It is on the first disk and there is a set of mirrored the information on the

second disk If a disk goes down the system does not even flinch It sends the operations personnel a messageabout failure and keeps on running

Cliques

In high school you can walk into the cafeteria and immediately identify the cliques (pronounced clicks) Inother words they are groups of students that hang around together because they have formed a common identityand a common bond The cliques in Teradata are similar to yet different from high school cliques

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 52: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5258

CLIQUES (pronounced cleeks) in Teradata are a method of system protection against the failure of an entirenode Multiple processing nodes (SMPs) are not only connected with an unbroken line to their own disks butare also with a dotted line to each others disks This shared disk arrangement forms a CLIQUE If a node failsthen its virtual processors (AMPs and PEPs) migrate to other nodes in its CLIQUE like birds flying south inwinter The receiving node now has twice as many VPROCs so its performance slows down The importantfactor is that the migrated VPROCs can still access their own disks and business continues until the failed nodeis repaired or replaced

The picture above shows two nodes A node can be thought of as a powerful PC with four Intel ProcessorsAMPs and PEs reside inside the nodes memory and there are about 10-16 AMPs per node and two-to-threePEs per node This configuration is a two-node 32 AMP system

Lets focus on AMP16 in node one and AMP 17 in node two (look at the arrows) AMP 16 has its own virtualdisk and similarly AMP 17 has its own virtual disk Remember no other AMP is allowed in another AMPsvirtual disk

What if an entire node is lost Well then AMPs 1-16 cannot access any disks To prevent this lets create aclique in our next picture The idea of a clique is to connect both nodes to one anothers disks That way if

either node goes down the AMPs can migrate over the BYNET and join the other 16 nodes in memoryHowever each AMP will still have a connection to the original virtual disks

In the illustration above cables have been added If node one or node two goes down the AMPs can migrate tothe other node and still have access their own disks The only difference is that the migrating AMPs now residein memory on different node plus they are accessing their own virtual disk via a different physical cable

People who come from the colder climates to spend their winters in sunny Florida are often called snowbirdsDo you know what bird migrates farther than any other bird on the planet It is the Arctic tern This bird leaves

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 53: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5358

its Arctic Circle home in August for its winter vacation home in Antarctica ndash a round trip of more than 11000miles

In the same way when a node goes down the software AMPs and PEs migrate over the Bynet to a temporaryhome on another node

Permanent Journal

The absent are always in the wrong

English Proverb

If a system had five million rows and used FALLBACK protection then it would have five millionFALLBACK rows However this would be quite costly because FALLBACK actually stores a duplicate copyof all the rows on other AMPs within the same cluster FALLBACK is used either because the system ismission critical or the system is not backed up regularly For customers who backup data regularly another option for data restoration is the Permanent Journal When a company is not severely impacted by a couple ofhours for a restoration to be completed this is a very good option The Permanent Journal works in conjunctionwith backup procedures plus its a lot more cost effective than FALLBACK

The Permanent Journal stores only images of rows that have been changed due to an INSERT UPDATE or DELETE command It keeps track of all new deleted or modified data since the last Permanent Journal backupThis option is usually less expensive than storing the additional five million FALLBACK rows

Like FALLBACK the Permanent Journal is optional It may be used on specific tables of your choosing or onno tables at all It provides the flexibility to customize a Journal to meet specific needs The Permanent Journalmust be manually purged from time to time

There are four image options for the Permanent Journal

1 The BEFORE JOURNAL stores an image of a table row before it changes It is used to perform amanual rollback to a specific point in time should there be a programming error

2 The AFTER JOURNAL stores an image of a table row after it changes It is used to manually rollforward from a specific point in time

3 A DUAL BEFORE JOURNAL captures two images of a table row before it changes This type of journal stores the duplicate images on two different AMPs

4 A DUAL AFTER JOURNAL captures two images of a table row after it changes and stores thoseimages on two different AMPs

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 54: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5458

In order to explain journaling lets say that the Customer Representative table is created with a BEFOREJournal After its created a programmer is told to move every Customer Representative from the WesternRegion to the newly designated Southwest Region However every representative from every region isaccidentally transferred to the Southeast Region Because there is a BEFORE Journal a programmer has theability to manually rollback the data to the specific point in time BEFORE this update occurred Note that thiswas not a transaction failure The update was successful but it was not accurate The BEFORE Journal saves theday

The AFTER JOURNAL works in the opposite way In this scenario company officials decided not to useFALLBACK on any tables The data was not mission-critical and it could be restored from backup tapes if necessary A FULL SYSTEM BACKUP takes place on the first day of each month Plus an AFTER JOURNAL has been placed on all the tables in the system Every time a new row is added or a change is madeto an existing row Teradata captures the AFTER image Suppose a hardware failure occurs on the 5th day of themonth and data is lost

To recover the data the hardware problem should be fixed and then the data should be reloaded from the FULLSYSTEM BACKUP done on the 1st of the month The AFTER JOURNAL is then used to capture thetransactions that either added or modified data between the 1st and 5th day of the month As you can see an

AFTER JOURNAL is used to roll forward and is usually done to restore data lost as a result of a hardware

problem

The following example shows the use of FALLBACK and the PERMANENT JOURNAL

CREATE TABLE TomCemployee FALLBACK BEFORE JOURNAL DUAL AFTER JOURNAL (emp INTEGER

dept INTEGER

lname CHAR(20)fname VARCHAR(20)salary DECIMAL(102)

hire_date DATE FORMAT)

UNIQUE PRIMARY INDEX(emp)

The example above created the table called Employee in the TomC database and is FALLBACK protected ABEFORE Journal and a DUAL AFTER Journal are specified Remember that both FALLBACK andJOURNALING have defaults of NO ndash meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING

Locking Modes in Teradata

You just obey instructions well take care of the obstructions

David Seamands

A private pilot was flying into a new town when the weather turned suddenly cloudy and he became confused Not very experienced in landing by instrument he began to panic thinking of the hills trees and buildings below But the local air traffic controller commanded him You just obey instructions well take care of theobstructions Many database systems can become confused when the number of users begins to grow But likea master air traffic controller Teradata uses a brilliant locking logic that gets each user to the right data at the proper time without conflicting or disastrous results

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 55: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5558

Teradata allows hundreds even thousands of users to access the data warehouse concurrently However therewould be a lot of confusion about which user had access to a table first if it were not for the LOCKINGMODES No one likes to be waiting for a long time in a line only to have someone cut in front of him or herTeradata uses LOCKS to help maintain data integrity Locks are activated on the targeted database table or row while the SQL request is executed Those locks are released upon query completion

There are four modes of locking

1 The EXCLUSIVE LOCK is the mother of all locks Its placed only on databases or tables and restrictsaccess to then whenever a structural change is made EXCLUSIVE LOCKing reminds me of whathappens when there is a structural change being made to a parking garage A construction company willwrap what seems like thousands of yards of bright orange plastic fencing around the garage in order tokeep people out and protecting them from falling debris To this day I have not seen a database or tablefall on top of a user The EXCLUSIVE LOCK prevents any access period This lock is placed on atable or database

2 The WRITE LOCK jumps to action whenever a user asks for an INSERT DELETE or UPDATE Keepin mind these commands are writing actions No other Exclusive Write or Read locks can cut in lineahead of an existing WRITE LOCK The only exception is an ACCESS LOCK ndash one that allows a user to read data that may not be totally accurate due to modifications being made at the time it is accessed

This kind of read is called a stale or dirty read3 Everybody loves the READ LOCK Its placed whenever the SELECT command is used With a READ

LOCK a thousand users can simultaneously SELECT from a table A READ LOCK will prevent either an Exclusive or WRITE LOCK from jumping ahead in the queue

4 When a user is not concerned with precisely accurate data he or she may request an ACCESS LOCKThis lock can jump in line ahead of either a READ or WRITE LOCK but not an EXCLUSIVE LOCK

Referential Integrity

Just how important is it to protect the integrity of your data This story says it all After reading anadvertisement offering split dry firewood for $60 a cord (including delivery) Jeff decided to place a phone

order Upon delivery Jeff was upset when the deliveryman finished stacking the wood Jeff objected Thatsnot a full cord of wood Well thats what I call a cord the man answered firmly Grudgingly Jeff pulledsome money out of his pocket and thrust it into the mans hands Hey just a minute the man said after counting the money You only gave me $30 Jeff shrugged his shoulders and replied Well thats what I call$60

Imagine getting fired from your job and the company deletes you from its employee table but forgets to deleteyou from the payroll table Thats not like getting fired hellip its more like getting fired up for a Bahamas vacationReferential Integrity would have stopped this oversight RI as it is called would not allow anyone to be deletedfrom the employee table unless he or she was also deleted from the payroll table

REFERENTIAL INTEGRITY (RI) is the relational concept that mandates that a row cannot be inserted into atable if it does not contain a column value that also exists in another table within the database Conversely arow with a corresponding value in another table may not be deleted unless the common value is first removedfrom the former table

An important function of RI on a newly created table is that it will not allow invalid data values to be enteredinto a column If RI is enforced on an existing table with RI violations the ALTER TABLE will proceed Plusit will copy and store the table and any related RI violations for review and correction Then the user will needto locate the table copy and then make corrections to the original table

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 56: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5658

Loading the Data

Overview

One night I said to my son When Abraham Lincoln was your age he studied by candlelight My sonretorted When Abraham Lincoln was your age he was president

Just as Lincoln will go down as one of the greatest presidents in history Teradata will not go down when itloads history Data within a warehouse environment is often historic in nature so the sheer volume of data canoverwhelm many systems But not Teradata

Teradata is so far ahead of the data loading game that other database vendors cant hold a candle to it A datawarehouse brings enormous amounts of data into the system This is an area that most companies overlook when purchasing a data warehouse Most company officials think loading data is simply thatndashjust loading dataSome people actually ask Are data loads that critical Come on ASCII stupid question and get a stupidANSI

Seriously though there are data warehouses in existence today that merely cant load data once it reaches a

certain volume As one Teradata developer said It is not the load that brings them down but the way theycarry it Even an experienced body builder must use a good technique to lift the weight over his head Whilemost database vendors are new to the game Teradata has had 15 years of practice loading the largest datawarehouses in the world Now the combination of Fastload Multiload and Tpump can load millions even billions of records in record time

Fastload

Fastload is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table Thisis how a Teradata table is populated the first time I have personally seen Teradata load over one billion largerows in less than 6 hours Plus I have seen Teradata load millions of rows in minutes Teradata has the quickesttime to solution and has the most powerful performance in the data warehousing industry How is Teradatasspeed and performance accomplished Its done through parallel processing

Fastload understands one SQL command - INSERT It inserts rows into an empty table The process is asfollows A flat file is prepared for loading on a mainframe or LAN The FASTLOAD utility needs three piecesof information to process where the flat file located what is its file definition and what table the data should beloaded into in Teradata

When the Fastload utility starts the Parsing Engine comes up with a plan for the AMPs The Parsing Enginethen steps back and lets the AMPs do their work The data is loaded in large 64K blocks Each AMP is given a64K block of rows for loading Like a line of workers trying to pass sand bags to prevent a flood Teradata

passes these blocks from AMP to AMP until all the data is on Teradata Next all AMPs take the blocks theyreceived hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNETOnce this is done each AMP sorts its data by Row ID and the table is ready for business

Fastload Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Only one table may be loaded at a timebull The table to be loaded must be emptybull There can be no secondary indexes referential integrity or triggers

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 57: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5758

bull It doesnt support Multi-set tables andbull It locks at the table level

Multiload

Where Fastload is meant to populate empty tables with INSERTS Multiload is meant to process INSERTSUPDATES and DELETES on tables that have existing data Multiload is extremely fast One major Teradatadata warehouse company processes 120 million inserts updates and deletes during its nightly batch

Multiload works similar to Fastload Data originates as a flat file on either a mainframe or LAN When theMultiload utility is executed the Parsing Engine creates a plan for the AMPs to follow The data is then passedto the AMPs in parallel in 64K blocks and the AMPs hash the rows to the proper AMP Last the INSERTSUPDATES and DELETES are applied

In the previous diagram the mainframeLAN is talking to the Parsing Engine The PE passes the data across theBYNET for the AMPs to retrieve Keep in mind many systems have hundreds to thousands of AMPs The loadtakes place continually in parallel when the 64K packets are delivered to the AMPs Multiload has beendesigned for users who have a need for speed

Multiload locks at the table level Therefore while Multiload is running the table is unavailable

Multiload Basics

bull Loads data to Teradata from a Mainframe or LAN flat file

bull Up to 20 INSERTS UPDATES or DELETES may be executed on up to 5 tablesbull Receiving tables are usually populatedbull There can be no Unique secondary indexes referential integrity or triggersbull It doesnt support Multi-set tables andbull It locks at the table level

Tpump

The Tpump utility is designed to allow OLTP transactions to immediately load into a data warehouse When Istarted working with Teradata more than 10 years ago most companies loaded data on a monthly basis

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN

Page 58: 143109593 Tom Coffing TD Basics

7162019 143109593 Tom Coffing TD Basics

httpslidepdfcomreaderfull143109593-tom-coffing-td-basics 5858

Suddenly companies began to load data weekly Today most companies load data nightly and industry leadersare loading data hourly Tpump is the beginning step of an Active Data Warehouse (ADW) ADW combinesOLTP transactions with a Decisions Support System (DSS)

You dont drown by falling into the water you drown by staying in the water

Edwin Louis Cole

If the data is not flowing a company can drown in it The utility is called Tpump because it theoretically actslike a water faucet Tpump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour It can also beautomatically preset to load different levels at certain times during the day and can be modified at any time

Also Tpump locks at a row level so users have access to the rest of the rows while the table is being loaded

Tpump Basics

bull Loads data to Teradata from a Mainframe or LAN flat filebull Processes INSERTS UPDATES or DELETESbull Tables are usually populatedbull It can have secondary indexes triggers and referential integritybull It doesnt support Multi-set tables andbull It locks at the row level

ConclusionmdashA Final Thought on Teradata

Genius is one percent inspiration and ninety-nine percent perspiration

Thomas Alva Edison

Thomas Edison only averaged 4 hours of sleep every night That is not surprising because that stupid light wasalways on Teradata developers averaged about 4 hours of sleep because as their brilliance continued to unfoldthe light kept going on Teradata was originally designed to handle large amounts of data back in 1976 Mostother databases were designed to handle On-line Transaction Processing (OLTP) Teradata has been able tocontinually improve on its design for the past 15 years at many of the largest data warehouse sites in the worldAs someone once said Before you can eat the fruit you must climb the tree Teradata has been climbing to thetop for over a decade The fruits of labor have paid off big for both Teradata and Teradata customers Here iswhy Teradata was made for e-business data warehousing

bull Parallel processing for unlimited performance

bull Unlimited scalability of data users and applicationsbull Ability to answer extremely complex queriesbull Ease of setup and maintenance ndash Only one DBA needed

Ability to load data at lightning speeds from a mainframe or LAN