Teradata Utilities MultiLoad

Teradata Utilities: MultiLoad

Reprinted for KV Satish Kumar, [email protected]

Reprinted with permission as a subscription benefit of Books24x7,http://www.books24x7.com/

http://www.books24x7.com/

Table of Contents Chapter 4: Multiload........................................................................................................................1

Why it is Called "Multi" Load..................................................................................................1Two MultiLoad Modes: IMPORT and DELETE................................................................1Block and Tackle Approach.............................................................................................2MultiLoad Imposes Limits.................................................................................................3

Error Tables, Work Tables and Log Tables...........................................................................3Supported Input Formats.......................................................................................................4MultiLoad Has Five IMPORT Phases....................................................................................5

Phase 1: Preliminary Phase.............................................................................................5Phase 2: DML Transaction Phase...................................................................................6Phase 3: Acquisition Phase.............................................................................................6Phase 4: Application Phase.............................................................................................7Phase 5: Clean Up Phase................................................................................................7

MultiLoad Commands............................................................................................................8Two Types of Commands................................................................................................8

Parameters for .BEGIN IMPORT MLOAD.............................................................................9Parameters for .BEGIN DELETE MLOAD...........................................................................12A Simple Multiload IMPORT Script......................................................................................12Building our Multiload Script................................................................................................13Executing Multiload..............................................................................................................14Another Simple MultiLoad IMPORT Script...........................................................................15MultiLoad IMPORT Script....................................................................................................18Error Treatment Options for the .DML LABEL Command....................................................19An IMPORT Script with Error Treatment Options................................................................21A IMPORT Script that Uses Two Input Data Files...............................................................22Redefining the INPUT..........................................................................................................24A Script that Uses Redefining the Input...............................................................................24DELETE MLOAD Script Using a Hard Coded Value...........................................................26A DELETE MLOAD Script Using a Variable........................................................................27An UPSERT Sample Script..................................................................................................28What Happens when MultiLoad Finishes.............................................................................29

MultiLoad Statistics........................................................................................................29Troubleshooting Multiload Errors.........................................................................................30RESTARTing Multiload........................................................................................................31RELEASE MLOAD: When You DON'T Want to Restart MultiLoad.....................................31MultiLoad and INMODs........................................................................................................32How Multiload Compares with FastLoad.............................................................................32

i

Chapter 4: Multiload

"In the end we'll remember not the sound of our enemies, but the silence of ourfriends."- Martin Luther King Jr.

Why it is Called "Multi" Load

If we were going to be stranded on an island with a Teradata Data Warehouse and we could onlytake along one Teradata load utility, clearly, MultiLoad would be our choice. MultiLoad has thecapability to load multiple tables at one time from either a LAN or Channel environment. This is instark contrast to its fleet-footed cousin, FastLoad, which can only load one table at a time. And itgets better, yet!

This feature rich utility can perform multiple types of DML tasks, including INSERT, UPDATE,DELETE and UPSERT on up to five (5) empty or populated target tables at a time. These DMLfunctions may be run either solo or in combinations, against one or more tables. For these reasons,MultiLoad is the utility of choice when it comes to loading populated tables in the batch environment.As the volume of data being loaded or updated in a single block, the performance of MultiLoadimproves. MultiLoad shines when it can impact more than one row in every data block. In otherwords, MultiLoad looks at massive amounts of data and says, "Bring it on!"

Leo Tolstoy once said, "All happy families resemble each other." Like happy families, the Teradataload utilities resemble each other, although they may have some differences. You are going to bepleased to find that you do not have to learn all new commands and concepts for each load utility.MultiLoad has many similarities to FastLoad. It has even more commands in common with TPump.The similarities will be evident as you work with them. Where there are some quirky differences, wewill point them out for you.

Two MultiLoad Modes: IMPORT and DELETE

MultiLoad provides two types of operations via modes: IMPORT and DELETE. In MultiLoadIMPORT mode, you have the freedom to "mix and match" up to twenty (20) INSERTs, UPDATEs orDELETEs on up to five target tables. The execution of the DML statements is not mandatory for allrows in a table. Instead, their execution hinges upon the conditions contained in the APPLY clauseof the script. Once again, MultiLoad demonstrates its user-friendly flexibility. For UPDATEs orDELETEs to be successful in IMPORT mode, they must reference the Primary Index in the WHEREclause.

The MultiLoad DELETE mode is used to perform a global (all AMP) delete on just one table. Thereason to use .BEGIN DELETE MLOAD is that it bypasses the Transient Journal (TJ) and can beRESTARTed if an error causes it to terminate prior to finishing. When performing in DELETE mode,the DELETE SQL statement cannot reference the Primary Index in the WHERE clause. This due tothe fact that a primary index access is to a specific AMP; this is a global operation.

The other factor that makes a DELETE mode operation so good is that it examines an entire blockof rows at a time. Once all the eligible rows have been removed, the block is written one time and acheckpoint is written. So, if a restart is necessary, it simply starts deleting rows from the next blockwithout a checkpoint. This is a smart way to continue. Remember, when using the TJ all deletedrows are put back into the table from the TJ as a rollback. A rollback can take longer to finish thenthe delete. MultiLoad does not do a rollback; it does a restart.

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

In the above diagram, monthly data is being stored in a quarterly table. To keep the contents limitedto four months, monthly data is rotated in and out. At the end of every month, the oldest month ofdata is removed and the new month is added. The cycle is "add a month, delete a month, add amonth, delete a month." In our illustration, that means that January data must be deleted to makeroom for May's data.

Here is a question for you: What if there was another way to accomplish this same goal withoutconsuming all of these extra resources? To illustrate, let's consider the following scenario: Supposeyou have TableA that contains 12 billion rows. You want to delete a range of rows based on a dateand then load in fresh data to replace these rows. Normally, the process is to perform a MultiLoadDELETE to DELETE FROM TableA WHERE <date-column> < '2002-02-01'. The final step would beto INSERT the new rows for May using MultiLoad IMPORT.

Block and Tackle Approach

MultiLoad never loses sight of the fact that it is designed for functionality, speed, and the ability torestart. It tackles the proverbial I/O bottleneck problem like FastLoad by assembling data rows into64K blocks and writing them to disk on the AMPs. This is much faster than writing data one row at atime like BTEQ. Fallback table rows are written after the base table has been loaded. This allowsusers to access the base table immediately upon completion of the MultiLoad while fallback rowsare being loaded in the background. The benefit is reduced time to access the data.

Amazingly, MultiLoad has full RESTART capability in all of its five phases of operation. Once again,this demonstrates its tremendous flexibility as a load utility. Is it pure magic? No, but it almostseems so. MultiLoad makes effective use of two error tables to save different types of errors and aLOGTABLE that stores built-in checkpoint information for restarting. This is why MultiLoad does notuse the Transient Journal, thus averting time-consuming rollbacks when a job halts prematurely.

Here is a key difference to note between MultiLoad and FastLoad. Sometimes an AMP (AccessModule Processor) fails and the system administrators say that the AMP is "down" or "offline."When using FastLoad, you must restart the AMP to restart the job. MultiLoad, however, cancontinue running when an AMP fails, if the table is fallback protected. As the same time, you canuse the AMPCHECK option to make it work like FastLoad if you want.

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 2


MultiLoad Imposes Limits

Rule #1: Unique Secondary Indexes are not supported on a Target Table. Like FastLoad,MultiLoad does not support Unique Secondary Indexes (USIs). But unlike FastLoad, it does supportthe use of Non-Unique Secondary Indexes (NUSIs) because the index subtable row is on the sameAMP as the data row. MultiLoad uses every AMP independently and in parallel. If two AMPs mustcommunicate, they are not independent. Therefore, a NUSI (same AMP) is fine, but a USI (differentAMP) is not.

Rule #2: Referential Integrity is not supported. MultiLoad will not load data into tables that aredefined with Referential Integrity (RI). Like a USI, this requires the AMPs to communicate with eachother. So, RI constraints must be dropped from the target table prior to using MultiLoad.

Rule #3: Triggers are not supported at load time. Triggers cause actions on related tables basedupon what happens in a target table. Again, this is a multi-AMP operation and to a different table. Tokeep MultiLoad running smoothly, disable all Triggers prior to using it.

Rule #4: No concatenation of input files is allowed. MultiLoad does not want you to do thisbecause it could impact are restart if the files were concatenated in a different sequence or datawas deleted between runs.

Rule #5: The host will not process aggregates, arithmetic functions or exponentiation. If youneed data conversions or math, you might be better off using an INMOD to prepare the data prior toloading it.

Error Tables, Work Tables and Log TablesBesides target table(s), MultiLoad requires the use of four special tables in order to function. Theyconsist of two error tables (per target table), one worktable (per target table), and one log table. Inessence, the Error Tables will be used to store any conversion, constraint or uniqueness violationsduring a load. Work Tables are used to receive and sort data and SQL on each AMP prior tostoring them permanently to disk. A Log Table (also called, "Logtable") is used to store successfulcheckpoints during load processing in case a RESTART is needed.

HINT: Sometimes a company wants all of these load support tables to be housed in a particulardatabase. When these tables are to be stored in any database other than the user's own defaultdatabase, then you must give them a qualified name (<databasename>.<tablename>) in the scriptor use the DATABASE command to change the current database.

Where will you find these tables in the load script? The Logtable is generally identified immediatelyprior to the .LOGON command. Worktables and error tables can be named in the BEGIN MLOADstatement. Do not underestimate the value of these tables. They are vital to the operation ofMultiLoad. Without them a MultiLoad job can not run. Now that you have had the "executivesummary", let's look at each type of table individually.

Two Error Tables: Here is another place where FastLoad and MultiLoad are similar. Both requirethe use of two error tables per target table. MultiLoad will automatically create these tables. Rowsare inserted into these tables only when errors occur during the load process. The first error table isthe acquisition Error Table (ET). It contains all translation and constraint errors that may occurwhile the data is being acquired from the source(s).

The second is the Uniqueness Violation (UV) table that stores rows with duplicate values forUnique Primary Indexes (UPI). Since a UPI must be unique, MultiLoad can only load oneoccurrence into a table. Any duplicate value will be stored in the UV error table. For example, youmight see a UPI error that shows a second employee number "99." In this case, if the name foremployee "99" is Kara Morgan, you will be glad that the row did not load since Kara Morgan isalready in the Employee table. However, if the name showed up as David Jackson, then you knowthat further investigation is needed, because employee numbers must be unique.



Each error table does the following:

Identifies errors•

Provides some detail about the errors•

Stores the actual offending row for debugging•

You have the option to name these tables in the MultiLoad script (shown later). Alternatively, if youdo not name them, they default to ET_<target_table_name> and UV_<target_table_name>. Ineither case, MultiLoad will not accept error table names that are the same as target table names. Itdoes not matter what you name them. It is recommended that you standardize on the namingconvention to make it easier for everyone on your team. For more details on how these error tablescan help you, see the subsection in this chapter titled, "Troubleshooting MultiLoad Errors."

Log Table: MultiLoad requires a LOGTABLE. This table keeps a record of the results from eachphase of the load so that MultiLoad knows the proper point from which to RESTART. There is oneLOGTABLE for each run. Since MultiLoad will not resubmit a command that has been runpreviously, it will use the LOGTABLE to determine the last successfully completed step.

Work Table(s): MultiLoad will automatically create one worktable for each target table. This meansthat in IMPORT mode you could have one or more worktables. In the DELETE mode, you will onlyhave one worktable since that mode only works on one target table. The purpose of worktables is tohold two things:

The Data Manipulation Language (DML) tasks1.

The input data that is ready to APPLY to the AMPs2.

The worktables are created in a database using PERM space. They can become very large. If thescript uses multiple SQL statements for a single data record, the data is sent to the AMP once foreach SQL statement. This replication guarantees fast performance and that no SQL statement willever be done more than once. So, this is very important. However, there is no such thing as a freelunch, the cost is space. Later, you will see that using a FILLER field can help reduce this diskspace by not sending unneeded data to an AMP. In other words, the efficiency of the MultiLoad runis in your hands.

Supported Input FormatsData input files come in a variety of formats but MultiLoad is flexible enough to handle many ofthem. MultiLoad supports the following five format options: BINARY, FASTLOAD, TEXT,UNFORMAT and VARTEXT.



BINARY Each record is a 2-byte integer, n, that is followed by n bytes of data. A byte is thesmallest means of storage of for Teradata.

FASTLOAD This format is the same as Binary, plus a marker (X '0A' or X '0D') that specifies theend of the record.

TEXT Each record has a random number of bytes and is followed by an end of the recordmarker.

UNFORMAT The format for these input records is defined in the LAYOUT statement of theMultiLoad script using the components FIELD, FILLER and TABLE.

VARTEXT This is variable length text RECORD format separated by delimiters such as acomma. For this format you may only use VARCHAR, LONG VARCHAR (IBM) orVARBYTE data formats in your MultiLoad LAYOUT. Note that two delimitercharacters in a row will result in a null value between them.

Figure 5-1

MultiLoad Has Five IMPORT PhasesMultiLoad IMPORT has five phases, but don't be fazed by this! Here is the short list:

Phase 1: Preliminary Phase•

Phase 2: DML Transaction Phase•

Phase 3: Acquisition Phase•

Phase 4: Application Phase•

Phase 5: Cleanup Phase•

Let's take a look at each phase and see what it contributes to the overall load process of thismagnificent utility. Should you memorize every detail about each phase? Probably not. But it isimportant to know the essence of each phase because sometimes a load fails. When it does, youneed to know in which phase it broke down since the method for fixing the error to RESTART mayvary depending on the phase. And if you can picture what MultiLoad actually does in each phase,you will likely write better scripts that run more efficiently.

Phase 1: Preliminary Phase

The ancient oriental proverb says, "Measure one thousand times; Cut once." MultiLoad uses Phase1 to conduct several preliminary set-up activities whose goal is to provide a smooth and successfulclimate for running your load. The first task is to be sure that the SQL syntax and MultiLoadcommands are valid. After all, why try to run a script when the system will just find out during theload process that the statements are not useable? MultiLoad knows that it is much better to identifyany syntax errors, right up front. All the preliminary steps are automated. No user intervention isrequired in this phase.

Second, all MultiLoad sessions with Teradata need to be established. The default is the number ofavailable AMPs. Teradata will quickly establish this number as a factor of 16 for the basis regardingthe number of sessions to create. The general rule of thumb for the number of sessions to use forsmaller systems is the following: use the number of AMPs plus two more. For larger systems withhundreds of AMP processors, the SESSIONS option is available to lower the default. Remember,



these sessions are running on your poor little computer as well as on Teradata.

Each session loads the data to Teradata across the network or channel. Every AMP plays anessential role in the MultiLoad process. They receive the data blocks, hash each row and send therows to the correct AMP. When the rows come to an AMP, it stores them in worktable blocks ondisk. But, lest we get ahead of ourselves, suffice it to say that there is ample reason for multiplesessions to be established.

What about the extra two sessions? Well, the first one is a control session to handle the SQL andlogging. The second is a back up or alternate for logging. You may have to use some trial and errorto find what works best on your system configuration. If you specify too few sessions it may impairperformance and increase the time it takes to complete load jobs. On the other hand, too manysessions will reduce the resources available for other important database activities.

Third, the required support tables are created. They are the following:Type of Table Table DetailsERRORTABLES MultiLoad requires two error tables per target table. The first error

table contains constraint violations, while the second error tablestores Unique Primary Index violations.

WORKTABLES Work Tables hold two things: the DML tasks requested and theinput data that is ready to APPLY to the AMPs.

LOGTABLE The LOGTABLE keeps a record of the results from each phase ofthe load so that MultiLoad knows the proper point from which toRESTART.

Figure 5-2

The final task of the Preliminary Phase is to apply utility locks to the target tables. Initially, accesslocks are placed on all target tables, allowing other users to read or write to the table for the timebeing. However, this lock does prevent the opportunity for a user to request an exclusive lock.Although, these locks will still allow the MultiLoad user to drop the table, no one else may DROP orALTER a target table while it is locked for loading. This leads us to Phase 2.

Phase 2: DML Transaction Phase

In Phase 2, all of the SQL Data Manipulation Language (DML) statements are sent ahead toTeradata. MultiLoad allows the use of multiple DML functions. Teradata's Parsing Engine (PE)parses the DML and generates a step-by-step plan to execute the request. This execution plan isthen communicated to each AMP and stored in the appropriate worktable for each target table. Inother words, each AMP is going to work off the same page.

Later, during the Acquisition phase the actual input data will also be stored in the worktable so that itmay be applied in Phase 4, the Application Phase. Next, a match tag is assigned to each DMLrequest that will match it with the appropriate rows of input data. The match tags will not actually beused until the data has already been acquired and is about to be applied to the worktable. This issomewhat like a student who receives a letter from the university in the summer that lists hiscourses, professor's names, and classroom locations for the upcoming semester. The letter is a"match tag" for the student to his school schedule, although it will not be used for several months.This matching tag for SQL and data is the reason that the data is replicated for each SQL statementusing the same data record.

Phase 3: Acquisition Phase

With the proper set-up complete and the PE's plan stored on each AMP, MultiLoad is now ready toreceive the INPUT data. This is where it gets interesting! MultiLoad now acquires the data in large,unsorted 64K blocks from the host and sends it to the AMPs.



At this point, Teradata does not care about which AMP receives the data block. The blocks aresimply sent, one after the other, to the next AMP in line. For their part, each AMP begins to dealwith the blocks that they have been dealt. It is like a game of cards - you take the cards that youhave received and then play the game. You want to keep some and give some away.

Similarly, the AMPs will keep some data rows from the blocks and give some away. The AMPhashes each row on the primary index and sends it over the BYNET to the proper AMP where it willultimately be used. But the row does not get inserted into its target table, just yet. The receivingAMP must first do some preparation before that happens. Don't you have to get ready beforecompany arrives at your house? The AMP puts all of the hashed rows it has received from otherAMPs into the worktables where it assembles them into the SQL. Why? Because once the rows arereblocked, they can be sorted into the proper order for storage in the target table. Now the utilityplaces a load lock on each target table in preparation for the Application Phase. Of course, there isno Acquisition Phase when you perform a MultiLoad DELETE task, since no data is being acquired.

Phase 4: Application Phase

The purpose of this phase is to write, or APPLY, the specified changes to both the target tables andNUSI subtables. Once the data is on the AMPs, it is married up to the SQL for execution. Toaccomplish this substitution of data into SQL, when sending the data, the host has already attachedsome sequence information and five (5) match tags to each data row. Those match tags are used tojoin the data with the proper SQL statement based on the SQL statement within a DMP label. Inaddition to associating each row with the correct DML statement, match tags also guarantee that norow will be updated more than once, even when a RESTART occurs.

The following five columns are the matching tags:MATCHING TAGS

ImportSeq Sequence number that identifies the IMPORT command where the erroroccurred

DMLSeq Sequence number for the DML statement involved with the errorSMTSeq Sequence number of the DML statement being carried out when the

error was discoveredApplySeq Sequence number that tells which APPLY clause was running when the

error occurredSourceSeq The number of the data row in the client file that was being built when

the error took place

Figure 5-3Remember, MultiLoad allows for the existence of NUSI processing during a load. Everyhash-sequence sorted block from Phase 3 and each block of the base table is read only once toreduce I/O operations to gain speed. Then, all matching rows in the base block are inserted,updated or deleted before the entire block is written back to disk, one time. This is why the matchtags are so important. Changes are made based upon corresponding data and DML (SQL) basedon the match tags. They guarantee that the correct operation is performed for the rows and blockswith no duplicate operations, a block at a time. And each time a table block is written to disksuccessfully, a record is inserted into the LOGTABLE. This permits MultiLoad to avoid starting againfrom the very beginning if a RESTART is needed.

What happens when several tables are being updated simultaneously? In this case, all of theupdates are scripted as a multi-statement request. That means that Teradata views them as asingle transaction. If there is a failure at any point of the load process, MultiLoad will merely need tobe RESTARTed from the point where it failed. No rollback is required. Any errors will be written tothe proper error table.

Phase 5: Clean Up Phase

Those of you reading these paragraphs that have young children or teenagers will certainlyappreciate this final phase! MultiLoad actually cleans up after itself. The utility looks at the final Error



Code (&SYSRC). MultiLoad believes the adage, "All is well that ends well." If the last error code iszero (0), all of the job steps have ended successfully (i.e., all has certainly ended well). This beingthe case, all empty error tables, worktables and the log table are dropped. All locks, both Teradataand MultiLoad, are released. The statistics for the job are generated for output (SYSPRINT) and thesystem count variables are set. After this, each MultiLoad session is logged off. So what happens ifthe final error code is not zero? Stay tuned. Restarting MultiLoad is a topic that will be covered laterin this chapter.

MultiLoad CommandsTwo Types of Commands

You may see two types of commands in MultiLoad scripts: tasks and support functions. MultiLoadtasks are commands that are used by the MultiLoad utility for specific individual steps as itprocesses a load. Support functions are those commands that involve the Teradata utility SupportEnvironment (covered in Chapter 9), are used to set parameters, or are helpful for monitoring aload.

The chart below lists the key commands, their type, and what they do.MLOAD

Command Type What does the MLOAD Command do?

.BEGIN[IMPORT]MLOAD

.BEGINDELETEMLOAD

Support This command communicates directly with Teradatato specify if the MultiLoad mode is going to beIMPORT or DELETE. Note that the word IMPORTis optional in the syntax because it is theDEFAULT, but DELETE is required. Werecommend using the word IMPORT to make thecoding consistent and easier for others to read. Anyparameters for the load, such as error limits orcheckpoints will be included under the .BEGINcommand, too. It is important to know whichcommands or parameters are optional ince, if youdo not include them, MultiLoad may supply defaultsthat may impact your load.

.DML LABELTask The DML LABEL defines treatment options and

labels for the application (APPLY) of data for theINSERT, UPDATE, UPSERT and DELETEoperations. A LABEL is simply a name for arequested SQL activity. The LABEL is defined first,and then referenced later in the APPLY clause.

.END MLOADTask This instructs MultiLoad to finish the APPLY

operations with the changes to the designateddatabases and tables.

.FIELDTask This defines a column of the data source record that

will be sent to the Teradata database via SQL.When writing the script, you must include a FIELDfor each data field you need in SQL. This commandis used with the LAYOUT command.

.FILLERTask Do not assume that MultiLoad has somehow

uncovered much of what you used in your termpapers at the university! FILLER defines a field thatis accounted for as part of the data source's rowformat, but is not sent to the Teradata DBS. It isused with the LAYOUT command.

.LAYOUTTask LAYOUT defines the format of the INPUT DATA

record so Teradata knows what to expect. If onerecord is not large enough, you can concatenate



http://www.books24x7.com//viewer.asp?bkid=17518&destid=330#330

multiple data records by using the LAYOUTparameter CONTINUEIF to tell which value toperform for the concatenation. Another option isINDICATORS, which is used to represent nulls byusing the bitmap (1 bit per field) at the front of thedata record.

.LOGONSupport This specifies the username or LOGON string that

will establish sessions for MultiLoad with Teradata.

.LOGTABLESupport This support command names the name of the

Restart Log that will be used for storingCHECKPOINT data pertaining to a load. TheLOGTABLE is then used to tell MultiLoad where toRESTART, should that be necessary. It isrecommended that this command be placed beforethe .LOGON command.

.LOGOFFSupport This command terminates any sessions established

by the LOGON command.

.IMPORTTask This command defines the INPUT DATA FILE, file

type, file usage, the LAYOUT to use and where toAPPLY the data to SQL.

.SETSupport Optionally, you can SET utility variables. An

example would be {.SET DBName TO 'CDW_Test'}.

.SYSTEMSupport This interrupts the operation of MultiLoad in order to

issue commands to the local operating system.

.TABLETask This is a command that may be used with the

.LAYOUT command. It identifies a table whosecolumns (both their order and data types) are to beused as the field names and data descriptions of thedata source records.

Figure 5-4

Parameters for .BEGIN IMPORT MLOADHere is a list of components or parameters that may be used in the .BEGIN IMPORT command.Note: The parameters do not require the usual dot prior to the command since they are actuallysub-commands.



PARAMETERREQUIRED

OR NOT WHAT IT DOES

AMPCHECK {NONE|APPLY|ALL}

Optional NONE specifiesthat MLOAD startseven with onedown AMP percluster if all tablesare Fallback.

APPLY(DEFAULT)specifies MLOADwill not start orfinish Phase 4 witha down AMP.

ALL specifies notto proceed if anyAMPs are down,just like FastLoad.

AXSMODOptional Short for Access

Module, thiscommand specifiesinput protocol likeOLE-DB or readinga tape from REELLibrarian. Thisparameter is fornetwork-attachedsystems only.When used, itmust precede theDEFINE commandin the script.

CHECKPOINTOptional You have two

options:CHECKPOINTrefers to thenumber ofminutes, orfrequency, atwhich you wish aCHECKPOINT tooccur if thenumber is 60 orless. If the numberis greater than 60,it designates thenumber of rows atwhich you want theCHECKPOINT tooccur. Thiscommand is NOTvalid in DELETEmode.

ERRLIMIT errcount [errpercent]Optional You may specify

the maximumnumber of errors,or the percentage,that you willtolerate during the



processing of aload job.

ERRORTABLES ET_ERR UV_ERR

Optional Names the twoerror tables, twoper target table.Note there is nocomma separator.

NOTIFY{LOW|MEDIUM|HIGH|OFF

Optional If you opt to useNOTIFY for a anyevent during aload, you maydesignate thepriority of thatnotification:

LOW for levelevents,

MEDIUM forimportant events,HIGH for events atoperationaldecision points,and OFF toeliminate anynotification at allfor a given phase.

SESSIONS <MAX> <MIN>Optional This refers to the

number ofSESSIONS thatshould beestablished withTeradata. ForMultiLoad, theoptimal number ofsessions is thenumber of AMPs inthe system, plustwo more.

You can also useMAX or MIN, whichautomatically usethe maximum orminimum numberof sessions tocomplete the job. Ifyou pecify nothing,it will default toMAX.

SLEEPOptional Tells MultiLoad

how frequently, inminutes, to trylogging on to thesystem.

TABLES Tablename1, Tablename2…, Tablename5Required Names up to 5

target tables.

TENACITYOptional Tells MultiLoad

how many hours totry logging on



when its initialeffort to do so isrebuffed.

WORKTABLES Tablename1, Tablename2…, Tablename5Optional Names the

worktable(s), oneper target table.

Figure 5-5

Parameters for .BEGIN DELETE MLOADHere is a list of components or parameters that may be used in the BEGIN DELETE command.Note: The parameters do not require the usual dot prior to the command since parameters areactually sub-commands.

A Simple Multiload IMPORT Script

"We must use time as a tool, not as a crutch."– John F. Kennedy

Ask Not – What your Multiload can do for you. Ask what you can do for your Multiload. Multiload is agreat tool when you're short on time. Multiload can update, insert, delete or upsert on Teradatatables that are already populated. It can even do all four in one script. Our flatfile will containEmployee_numbers and Salaries * 2. We are giving a big raise. We're going to create a flat file touse with Multiload, as shown below:

Let's create a flat file for our Multiload

Let's Execute it:

Remember, we'll still use the BTEQ utility to create our flat file.



Building our Multiload Script

"I can accept failure, but I can't accept not trying."- Michael Jordan

Getting these scripts down is a very hard process, so don't be discouraged if you have a couple ofmistakes. The next two slides will show you a blank copy of the basic Multiload script, as well as amarked slide illustrating the important parts of the script:

"If you don't know where you're going, any road will take you there."- Lewis Carrol

Creating our Multiload script



Executing Multiload

"Ambition is a dream with a V8 Engine."- Elvis Presley

You will feel like the King after executing your first Multiload script. Multiload is the Elvis Presley ofdata warehousing because nobody knows how make more records then Multiload. If you have theambition to learn, this book will give you what it takes to steer through these utilities. We initializethe Multiload utility like we do with BTEQ, except that the keyword with Multiload Is mload.Remember that this Multiload is going to double the salaries of our employees.

Let's execute our Multiload script

Here is a before and after image of our Employee_table02:



Another Simple MultiLoad IMPORT Script

"Those who dance are considered insane by those who cannot hear the music."- George Carlin

MultiLoad can be somewhat intimidating to the new user because there are many commands andphases. In reality, the load scripts are understandable when you think through what the IMPORTmode does:

Setting up a Logtable•

Logging onto Teradata•

Identifying the Target, Work and Error tables•

Defining the INPUT flat file•

Defining the DML activities to occur•

Naming the IMPORT file•

Telling MultiLoad to use a particular LAYOUT•



Telling the system to start loading•

Finishing loading and logging off of Teradata•

This first script example is designed to show MultiLoad IMPORT in its simplest form. It depicts theloading of a three-column Employee table. The actual script is in the left column and our commentsare on the right. Below the script is a step-by-step description of how this script works.

Step One: Setting up a Logtable and Logging onto Teradata — MultiLoad requires you specify alog table right at the outset with the .LOGTABLE command. We have called it CDW_Log. Once youname the Logtable, it will be automatically created for you. The Logtable may be placed in the samedatabase as the target table, or it may be placed in another database. Immediately after this you logonto Teradata using the .LOGON command. The order of these two commands is interchangeable,but it is recommended to define the Logtable first and then to Log on, second. If you reverse theorder, Teradata will give a warning message. Notice that the commands in MultiLoad require a dotin front of the command key word.

Step Two: Identifying the Target, Work and Error tables — In this step of the script you must tellTeradata which tables to use. To do this, you use the .BEGIN IMPORT MLOAD command. Thenyou will preface the names of these tables with the sub-commands TABLES, WORKTABLES ANDERROR TABLES. All you must do is name the tables and specify what database they are in. Worktables and error tables are created automatically for you. Keep in mind that you get to name andlocate these tables. If you do not do this, Teradata might supply some defaults of its own!

At the same time, these names are optional. If the WORKTABLES and ERRORTABLES had notspecifically been named, the script would still execute and build these tables. They would havebeen built in the default database for the user. The name of the worktable would beWT_EMPLOYEE_DEPT1 and the two error tables would be called ET_EMPLOYEE_DEPT1 andUV_EMPLOYEE_DEPT1, respectively.

Sometimes, large Teradata systems have a work database with a lot of extra PERM space. Onecustomer calls this database CORP_WORK. This is where all of the logtables and worktables arenormally created. You can use a DATABASE command to point all table creations to it or qualify thenames of these tables individually.

Step Three: Defining the INPUT flat file record structure — MultiLoad is going to need to knowthe structure the INPUT flat file. Use the .LAYOUT command to name the layout. Then list the fieldsand their data types used in your SQL as a .FIELD. Did you notice that an asterisk is placedbetween the column name and its data type? This means to automatically calculate the next byte inthe record. It is used to designate the starting location for this data based on the previous fieldslength. If you are listing fields in order and need to skip a few bytes in the record, you can either usethe .FILLER (like above) to position to the cursor to the next field, or the "*" on the Dept_No fieldcould have been replaced with the number 132 (CHAR(11)+CHAR(20)+CHAR(100)+1). Then, the.FILLER is not needed. Also, if the input record fields are exactly the same as the table, the .TABLEcan be used to automatically define all the .FIELDS for you. The LAYOUT name will be referencedlater in the .IMPORT command. If the input file is created with INDICATORS, it is specified in theLAYOUT.

Step Four: Defining the DML activities to occur — The .DML LABEL names and defines theSQL that is to execute. It is like setting up executable code in a programming language, but usingSQL. In our example, MultiLoad is being told to INSERT a row into the SQL01.Employee_Depttable. The VALUES come from the data in each FIELD because it is preceded by a colon (:). Areyou allowed to use multiple labels in a script? Sure! But remember this: Every label must bereferenced in an APPLY clause of the .IMPORT clause.

Step Five: Naming the INPUT file and its format type — This step is vital! Using the .IMPORTcommand, we have identified the INFILE data as being contained in a file called"CDW_Join_Export.txt". Then we list the FORMAT type as TEXT. Next, we referenced theLAYOUT named FILEIN to describe the fields in the record. Finally, we told MultiLoad to APPLY the




DML LABEL called INSERTS — that is, to INSERT the data rows into the target table. This is still asub-component of the .IMPORT MLOAD command. If the script is to run on a mainframe, theINFILE name is actually the name of a JCL Data Definition (DD) statement that contains the realname of the file.

Notice that the .IMPORT goes on for 4 lines of information. This is possible because it continuesuntil it finds the semi-colon to define the end of the command. This is how it determines oneoperation from another. Therefore, it is very important or it would have attempted to process theEND LOADING as part of the IMPORT — it wouldn't work.

Step Six: Finishing loading and logging off of Teradata — This is the closing ceremonies for theload. MultiLoad to wrap things up, closes the curtains, and logs off of the Teradata system.

Important note: Since the script above in Figure 5-6 does not DROP any tables, it iscompletely capable of being restarted if an error occurs. Compare this to the nextscript in Figure 5-7. Do you think it is restartable? If you said no, pat yourself on theback.

PARAMETERREQUIREDOR NOT WHAT IT DOES

TABLES Tablename1Required Names the Target table.

WORKTABLES Tablename1Optional Names the worktable one per target table.

ERRORTABLES ET_ERR UV_ERR

Optional Names the two error tables, two per targettable and there is no comma separatorbetween them.

TENACITYOptional Tells MultiLoad how many hours to try

establishing sessions when its initial effortto do so is rebuffed.

Figure 5-6

/* Simple Mload script */

.LOGTABLE SQL01.CDW_Log;

.LOGON TDATA/SQL01,SQL0;

Sets Up a Logtable andLogs on to Teradata

.BEGIN IMPORT MLOAD TABLES SQL01.Employee_Dept1

WORKTABLES SQL01.CDW_WTERRORTABLES SQL01.CDW_ET

SQL01.CDW_UV;

Begins the Load Processby naming the TargetTable, Work table anderror tables; Notice NOcomma between the errortables

.LAYOUT FILEIN;.FIELD Employee_No * CHAR(11);.FIELD Last_Name * CHAR(20);.FILLER Junk_stuff * CHAR(100);.FIELD Dept_No * CHAR(6);

Names the LAYOUT of theINPUT record and definesits structure; Notice thedots before the FIELD andFILLER and thesemi-colons after eachdefinition.

.DML LABEL INSERTS;Names the DML Label

INSERT INTO SQL01.Employee_Dept1(Employee_No

,Last_Name ,Dept_No )

Tells MultiLoad to INSERTa row into the target tableand defines the rowformat.



VALUES(:Employee_No,:Last_Name,:Dept_No );

Lists, in order, theVALUES (each onepreceded by a colon) tobe INSERTed.

.IMPORT INFILE CDW_Join_Export.txtFORMAT TEXTLAYOUT FILEINAPPLY INSERTS;

Names the Import File andits Format type; Cites theLAYOUT file to use tellsMload to APPLY theINSERTs.

.END MLOAD;

.LOGOFF;

Ends MultiLoad and Logsoff all MultiLoad sessions

Figure 5-7

MultiLoad IMPORT ScriptLet's take a look at MultiLoad IMPORT script that comes from real life. This sample script will lookmuch more like what you might encounter at your workplace. It is more detailed. The notes to theright are brief and too the point. They will help you can grasp the essence of what is happening inthe script.

/* !/bin/ksh* */Load Runs from aShell Script

/* +++++++++++++++++++++++++++++++++++++*//* MultiLoad SCRIPT *//*This script is designed to change the *//*EMPLOYEE_DEPT1 table using the data found *//* in IMPORT INFILE CDW_Join_Export.txt *//* Version 1.1 *//* Created by Coffing Data Warehousing *//* +++++++++++++++++++++++++++++++++++++*/

Any words between /*… */ are commentsonly and are notprocessed byTeradata.

Names and describesthe purpose of thescript; names theauthor


.RUN FILE LOGON.TXT;

/*Drop Error Tables — caution, this script cannot berestarted because these tables would be needed */DROP TABLE SQL01.CDW_ET;DROP TABLE SQL01.CDW_UV;

Secures the logon bystoring userid andpassword in a separatefile, then reads it.

Drops Existing errortables and cancels theability for the script torestart – DON'TATTEMPT THIS ATHOME! Also, SQL doesnot use a dot (.)

/* Begin Import and Define Work and Error Tables */.BEGIN IMPORT MLOAD TABLES SQL01.Employee_Dept1

WORKTABLES SQL01.CDW_WT

ERRORTABLES SQL01.CDW_ET SQL01.CDW_UV;

Begins the LoadProcess by telling usfirst the names of thetarget table, Work tableand error tables; noteNO comma betweenthe names of the errortables

/* Define Layout of Input File */Names the LAYOUT ofthe INPUT file.



.LAYOUT FILEIN;.FIELD Employee_No * CHAR(11);.FIELD First_Name * CHAR(14);.FIELD Last_Name * CHAR(20);.FIELD Dept_No * CHAR(6);.FIELD Dept_Name * CHAR(20);

Defines the structureof the INPUT file.Notice the dots beforethe FIELD commandand the semi-colonsafter each FIELDdefinition.

/* Begin INSERT Process on Table */.DML LABEL INSERTS; INSERT INTO SQL01.Employee_Dept1

( Employee_No ,First_Name ,Last_Name ,Dept_No ,Dept_Name ) VALUES

( :Employee_No,:First_Name,:Last_Name,:Dept_No,:Dept_Name );

Names the DML Label

Tells MultiLoad toINSERT a row into thetarget table anddefines the row format.

Note that we placecomma separators infront of the followingcolumn or value foreasier debugging.

Lists, in order, theVALUES to beINSERTed.

/* Specify IMPORT File and Apply Parameters */.IMPORT INFILE CDW_Join_Export.txt

FORMAT TEXTLAYOUT FILEINAPPLY INSERTS;

Names the Import Fileand States its Formattype; Names theLayout file to use Andtells MultiLoad toAPPLY the INSERTs.

.END MLOAD;

.LOGOFF;

Ends MultiLoad andLogs off of Teradata

Figure 5-8

Error Treatment Options for the .DML LABEL CommandMultiLoad allows you to tailor how it deals with different types of errors that it encounters during theload process, to fit your needs. Here is a summary of the options available to you:



ERROR TREATMENT OPTIONS FOR .DML LABEL

.DML LABEL {labelname}

{MARK | IGNORE} DUPLICATE [INSERT |UPDATE] ROWS {MARK | IGNORE} MISSING [INSERT |UPDATE] ROWS

DO INSERT FOR [MISSING UPDATE] ROWS ;

Figure 5-9In IMPORT mode, you may specify as many as five distinct error-treatment options for one.DML statement. For example, if there is more than one instance of a row, do you want MultiLoad toIGNORE the duplicate row, or to MARK it (list it) in an error table?

If you do not specify IGNORE, then MultiLoad will MARK, or record all of the errors. Imagine youhave a standard INSERT load that you know will end up recording about 20,000 duplicate rowerrors. Using the following syntax "IGNORE DUPLICATE INSERT ROWS;" will keep them out of theerror table. By ignoring those errors, you gain three benefits:

You do not need to see all the errors.1.

The error table is not filled up needlessly.2.

MultiLoad runs much faster since it is not conducting a duplicate row check.3.

When doing an UPSERT, there are two rules to remember:

The default is IGNORE MISSING UPDATE ROWS. Mark is the default for all operations.When doing an UPSERT, you anticipate that some rows are missing, otherwise, why do anUPSERT. So, this keeps these rows out of your error table.

•

The DO INSERT FOR MISSING UPDATE ROWS is mandatory. This tells MultiLoad toinsert a row from the data source if that row does not exist in the target table because theupdate didn't find it.

•



The table that follows shows you, in more detail, how flexible your options are:ERROR TREATMENT OPTIONS IN DETAIL

.DML LABELOPTION

WHAT IT DOES

MARK DUPLICATEINSERT ROWS

This option logs an entry for all duplicate INSERT rows in theUV_ERR table. Use this when you want to know about theduplicates.

IGNORE DUPLICATEINSERT ROWS

This tells MultiLoad to IGNORE duplicate INSERT rowsbecause you do not want to see them.

MARK DUPLICATEUPDATE ROWS

This logs the existence of every duplicate UPDATE row.

IGNORE DUPLICATEUPDATE ROWS

This eliminates the listing of duplicate update row errors.

MARK MISSINGUPDATE ROWS

This option ensures a listing of data rows that had to beINSERTed since there was no row to UPDATE.

IGNORE MISSINGUPDATE ROWS

This tells MultiLoad NOT to list UPDATE rows as an error.This is a good option when doing an UPSERT sinceUPSERT will INSERT a new row.

MARK MISSINGDELETE ROWS

This option makes a note in the ET_Error Table that a row tobe deleted is missing.

IGNORE MISSINGDELETE ROWS

This option says, "Do not tell me that a row to be deleted ismissing.

DO INSERT forMISSING UPDATEROWS

This is required to accomplish an UPSERT. It tells MultiLoadthat if the row to be updated does not exist in the target table,then INSERT the entire row from the data source.

Figure 5-10

An IMPORT Script with Error Treatment OptionsThe command .DML LABEL names any DML options (INSERT, UPDATE OR DELETE) thatimmediately follow it in the script. Each label must be given a name. In IMPORT mode, the label willbe referenced for use in the APPLY Phase when certain conditions are met. The following scriptprovides an example of just one such possibility:


/* +++++++++++++++++++++++++++++++++++++*//* MultiLoad SCRIPT *//*This script is designed to change the *//*EMPLOYEE_DEPT table using the data from *//* the IMPORT INFILE CDW_Join_Export.txt *//* Version 1.1 *//* Created by Coffing Data Warehousing*//* +++++++++++++++++++++++++++++++++++++ */

Any words between /*… */ are COMMENTSONLY and are notprocessed byTeradata.

Names and describesthe purpose of thescript; names theauthor

/* Setup the MulitLoad Logtables, Logon Statements*/.LOGTABLE SQL01.CDW_Log;.LOGON TDATA/SQL01,SQL01;

DATABASE SQL01;

Sets up a Logtable andthen logs on toTeradata.

Specifies the databasein which to find thetarget table.



/*Drop Error Tables */DROP TABLE WORKDB.CDW_ET;DROP TABLE WORKDB.CDW_UV;

Drops Existing errortables in the workdatabase.

/* Begin Import and Define Work and Error Tables */.BEGIN IMPORT MLOAD TABLES Employee_Dept

WORKTABLES WORKDB.CDW_WT

ERRORTABLES WORKDB.CDW_ET WORKDB.CDW_UV;

Begins the LoadProcess by telling usfirst the names of theTarget Table, Worktable and error tablesare in a work database.Note there is nocomma between thenames of the errortables (pair).

/* Define Layout of Input File */

.LAYOUT FILEIN;.FIELD Employee_No * CHAR(11);.FIELD First_Name * CHAR(14);.FIELD Last_Name * CHAR(20);.FIELD Dept_No * CHAR(6);.FIELD Dept_Name * CHAR(20);

Names the LAYOUT ofthe INPUT file.

Defines the structureof the INPUT file.Notice the dots beforethe FIELD commandand the semi-colonsafter each FIELDdefinition.

/* Begin INSERT Process on Table */.DML LABEL INSERTSIGNORE DUPLICATE INSERT ROWS;INSERT INTO SQL01.Employee_Dept

( Employee_No ,First_Name ,Last_Name ,Dept_No ,Dept_Name)VALUES

( :Employee_No,:First_Name,,:Last_Name,,:Dept_No,,:Dept_Name);

Names the DML Label

Tells MultiLoad NOTTO LIST duplicateINSERT rows in theerror table; notice theoption is placedAFTER the LABELidentification andimmediately BEFOREthe DML function.

Lists, in order, theVALUES to beINSERTed.

/* Specify IMPORT File and Apply Parameters */.IMPORT INFILE CDW_Join_Export.txt

FORMAT TEXTLAYOUT FILEINAPPLY INSERTS;

Names the Import Fileand States its Formattype; names theLayout file to use andtells MultiLoad toAPPLY the INSERTs.

.END MLOAD;

.LOGOFF;

Ends MultiLoad andlogs off of Teradata

Figure 5-11

A IMPORT Script that Uses Two Input Data Files




/*MultiLoad IMPORT SCRIPT with two INPUT files */*//*This script INSERTs new rows into the *//* Employee_table and UPDATEs the Dept_Name *//*in the Department_table. *//* Version 1.1 *//* Created by Coffing Data Warehousing *//* +++++++++++++++++++++++++++++++++++++*/


.LOGTABLE SQL01.EMPDEPT_LOG;

.RUN FILE c:\mydir\logon.txt;

Sets up a Logtableand logs on with.RUN.The logon.txt filecontains:

.logonTDATA/SQL01,SQL01;

DROP TABLE SQL01.EMP_WT;DROP TABLE SQL01.DEPT_WT;DROP TABLE SQL01.EMP_ET;DROP TABLE SQL01.EMP_UV;DROP TABLE SQL01.DEPT_ET;DROP TABLE SQL01.DEPT_UV;

Drops the worktablesand error tables, incase they existedfrom a prior load;NOTE: Do NOTinclude IF you want toRESTART usingCHECKPOINT.

/* the following defines 2 tables for loading */

.BEGIN IMPORT MLOADTABLES

SQL01.Employee_Table, SQL01.Department_Table

WORKTABLES SQL01.EMP_WT, SQL01.DEPT_WT

ERRORTABLES SQL01.EMP_ET SQL01.EMP_UV, SQL01.DEPT_ET SQL01.DEPT_UV;

Identifies the 2 targettables with a commabetween them.

Names the worktableand error tables foreach target table;

Note there are NOcommas between thepair of names, butthere is a commabetween this pair andthe next pair.

/* these next 2 LAYOUTs define 2 different records */.LAYOUT FILEIN1;

.FIELD Emp_No * INTEGER;

.FIELD LName * CHAR(20);

.FIELD FName * VARCHAR(20);

.FIELD Sal * DECIMAL (10,2);

.FIELD Dept_Num * INTEGER;

Names and Definesthe LAYOUT of the 1st

INPUT file

.LAYOUT FILEIN2;.FIELD DeptNo * CHAR(6);.FIELD DeptName * CHAR(20);

Names and Definesthe LAYOUT of the2nd INPUT file

.DML LABEL EMP_INSIGNORE DUPLICATE INSERT ROWS;INSERT INTO SQL01.Employee_TableVALUES (:Emp_No

,:FName ,:LName ,:Sal ,:Dept_Num);

Names the 1st DMLLabel; Tells MultiLoadto IGNORE duplicateINSERT rows becauseyou do not want tosee them.

INSERT a row into thetable, but does NOTname the columns. So



all VALUES arepassed IN THEORDER they aredefined in theEmployee table.

.DML LABEL DEPT_UPD;UPDATE Department_TableSET Dept_Name = :DeptNameWHERE Dept_No = :DeptNo;

Names the 2nd DMLLabel;

Tells MultiLoad toUPDATE when it findsDeptno (record) equalto the Dept_No in theDepartment_table andchange theDept_name columnwith the DeptNamefrom the INPUT file.

.IMPORT INFILE Emp_DataLAYOUT FILEIN1APPLY EMP_INS;

.IMPORT INFILE Dept_DataLAYOUT FILEIN2APPLY DEPT_UPD;

Names the TWOImport Files

Names the TWOLayouts that definethe structure of theINPUT DATA files …and tells MultiLoad toAPPLY the INSERTsto target table 1 andthe UPDATEs totarget table 2.

.END MLOAD;

.LOGOFF;

Ends MultiLoad andlogs off of Teradata.

Figure 5-12

Redefining the INPUTSometimes, instead of using two different INPUT DATA files, which require two separate LAYOUTs,you can combine them into one INPUT DATA file. And you can use that one file, with just oneLAYOUT to load more than one table! You see, a flat file may contain more than one type of datarecord. As long as each record has a unique code to identify it, MultiLoad can check this code andknow which layout to use for using different names in the same layout. To do this you will need toREDEFINE the INPUT. You do this by redefining a field's position in the .FIELD or .FILLER sectionof the LAYOUT. Unlike the asterisk (*), which means that a field simply follows the previous one,redefining will cite a number that tells MultiLoad to take a certain portion of the INPUT file and jumpto the redefined position to back toward the beginning of the record.

A Script that Uses Redefining the InputThe following script uses the ability to define two record types in the same input data file. It uses a.FILLER to define the code since it is never used in the SQL, only to determine which SQL to run.




/* +++++++++++++++++++++++++++++++++++++*//* MultiLoad IMPORT SCRIPT with multiple target *//*tables and DML labels *//*This script INSERTs new rows into the *//* Employee_table and UPDATEs the Dept_Name *//*in the Department_table *//* Version 1.1 *//* Created by Coffing Data Warehousing *//* +++++++++++++++++++++++++++++++++++++*/


.LOGTABLE SQL01.EmpDept_Log;

.LOGON TDATA/SQL01,SQL01;

Sets Up a Logtableand Logs on toTeradata; Optionally,specifies thedatabase to work in.

/* 2 target tables, 2 work tables, 2 error tables pertarget table, defined in pairs */BEGIN IMPORT MLOAD

TABLES SQL01.Employee_Table, SQL01.Department_Table

WORKTABLES SQL01.EMP_WT, SQL01.DEPT_WT

ERRORTABLES SQL01.EMP_ET SQL01.EMP_UV, SQL01.DEPT_ET SQL01 .DEPT_UV;

Identifies the 2 targettables;

Names the worktableand error tables foreach target tables;

Note there is nocomma between thenames of the errortables but there is acomma between thepair of error tables.

.LAYOUT FILEIN;.FILLER Trans * CHAR (1);.FIELD Emp_No * INTEGER;.FIELD Dept_Num * INTEGER;.FIELD LName * CHAR(20);.FIELD FName * VARCHAR(20);.FIELD Sal * DECIMAL (10,2);.FIELD DeptNo 2 INTEGER;.FIELD DeptName * CHsssssssAR(20);

Names and definesthe LAYOUT of theINPUT record. TheFILLER is for a fieldthat tells what type ofrecord has been read.Here that fieldcontains an "E" or a"D". The "E" tellsMLOAD use theEmployee data andthe "D" is fordepartment data.

The definition forDept_Num tellsMLOAD to jumpbackward to byte 2.Where as the * forEmp_Num defaultedto byte 2. So, Emp_Noand Dept_Num bothstart at byte 2, but indifferent types ofrecords. When Trans(byte position 1)contains a "D", theAPPLY uses the deptnames and for an "E"the APPLY uses theemployee data.



.DML LABEL EMPINIGNORE DUPLICATE INSERT ROWS;

INSERT INTO SQL01.Employee_TableVALUES ( :Emp_No

,:FName,:LName,:Sal,:Dept_Num );

Names the 1st DMLLabel; Tells MultiLoadto IGNORE duplicateINSERT rows becauseyou do not want tosee them.

Tells MultiLoad toINSERT a row into the1st target table butoptionally does NOTdefine the target tablerow format. All theVALUES are passedto the columns of theEmployee table INTHE ORDER of thattable's row format.

.DML LABEL DEPTIN;UPDATE Department_TableSET Dept_Name = :DeptNameWHERE Dept_No = :DeptNo;

Names the 2nd DMLLabel;

Tells MultiLoad toUPDATE the 2nd

target table butoptionally does NOTdefine that table's rowformat. When theVALUE of the DeptNoequals that of theDept_No column ofthe Department, thenupdate theDept_Name columnwith the DeptNamefrom the INPUT file.

.IMPORT INFILE UPLOAD.datLAYOUT FILEINAPPLY EMPIN WHERE Trans = 'E'APPLY DEPTIN WHERE Trans = 'D' ;

.END MLOAD;

.LOGOFF;

Ends MultiLoad andlogs off of Teradata.

Figure 5-13

DELETE MLOAD Script Using a Hard Coded ValueThe next script demonstrates how to use the MultiLoad DELETE task. In this example, students nolonger enrolled in the university are being removed from the Student_Profile table, based upon theregistration date. The profile of any student who enrolled prior to this date will be removed.



.LOGTABLE RemoveLog;

.LOGON TDATA/SQL01,SQL01;Identifies the Logtable and logs onto Teradata with a validlogon string.

.BEGIN DELETE MLOADTABLES Order_Table;

Begins MultiLoad in DELETE mode and Names the targettable.

DELETE FROM Order_TableWHERE Order_Date < '99/12/31';

SQL DELETE statement does a massive delete of orderdata for orders placed prior to the hard coded date in theWHERE clause. Notice that this is not the Primary Index.You CANNOT DELETE in DELETE MLOAD mode basedupon the Primary Index.

.END MLOAD;LOGOFF;

Ends loading and logs off of Teradata.

Figure 5-14How many differences from a MultiLoad IMPORT script readily jump off of the page at you? Hereare a few that we saw:

At the beginning, you must specify the word "DELETE" in the .BEGIN MLOAD command.You need not specify it in the .END MLOAD command.

•

You will readily notice that this mode has no .DML LABEL command. Since it is focused onjust one absolute function, no APPLY clause is required so you see no .DML LABEL.

•

Notice that the DELETE with a WHERE clause is an SQL function, not a MultiLoadcommand, so it has no dot prefix.

•

Since default names are available for worktables (WT_<target_tablename>) and error tables(ET_<target_tablename> and UV_<target_tablename>), they need not be specificallynamed, but be sure to define the Logtable.

•

Do not confuse the DELETE MLOAD task with the SQL delete task that may be part of a MultiLoadIMPORT. The IMPORT delete is used to remove small volumes of data rows based upon thePrimary Index. On the other hand, the MultiLoad DELETE does global deletes on tables, bypassingthe Transient Journal. Because there is no Transient Journal, there are no rollbacks when the jobfails for any reason. Instead, it may be RESTARTed from a CHECKPOINT. Also, the MultiLoadDELETE task is never based upon the Primary Index.

Because we are not importing any data rows, there is neither a need for worktables nor anAcquisition Phase. One DELETE statement is sent to all the AMPs with a match tag parcel. Thatstatement will be applied to every table row. If the condition is met, then the row is deleted. Usingthe match tags, each target block is read once and the appropriate rows are deleted.

A DELETE MLOAD Script Using a VariableThis illustration demonstrates how passing the values of a data row rather than a hard coded valuemay be used to help meet the conditions stated in the WHERE clause. When you are passingvalues, you must add some additional commands that were not used in the DELETE example withhard coded values.



.LOGTABLE RemoveLog;

.LOGON TDATA/SQL01,SQL01;Identifies the Logtable and logs onto Teradata with a validlogon string.

.BEGIN DELETE MLOADTABLES Order_Table;

Begins the DELETE task and names only one table, but stilluses TABLES option.

.LAYOUT OldMonth.FIELD OrdDate * DATE;

Names the LAYOUT and defines the column whose value willbe passed as a single row to MultiLoad. In this case, all of theorder dates in the Order_Table will be tested against thisOrdDate value.

DELETE FROM Order_TableWHERE Order_Date < :OrdDate;

The condition in the WHERE clause is that the data rows withorders placed prior to the date value (:OrdDate) passed fromthe LAYOUT OldMonth will be DELETEd from theOrder_Table.

.IMPORT INFILELAYOUT OldMonth ;

Note that this time there is no dot in front of LAYOUT in thisclause since it is only being referenced.

.END MLOAD;

.LOGOFF;

Ends loading and logs off of Teradata.

Figure 5-15

An UPSERT Sample ScriptThe following sample script is provided to demonstrate how to do an UPSERT — that is, to updatea table and if a row from the data source table does not exist in the target table, then insert a newrow. In this instance we are loading the Student_Profile table with new data for the next semester.The clause "DO INSERT FOR MISSING UPDATE ROWS" indicates an UPSERT. The DMLstatements that follow this option must be in the order of a single UPDATE statement followed by asingle INSERT statement.

/* !/bin/ksh* *//* +++++++++++++++++++++++++++++++++++++++++++++++++ *//* MultiLoad UPSERT SCRIPT *//*This script Updates the Student_Profile Table *//* with new data and Inserts a new row into the table *//* if the row to be updated does not exist. *//* Version 1.1 *//* Created by Coffing Data Warehousing *//* ++++++++++++++++++++++++++++++++++++++++++++++++++*/

Load Runs from ashell script; Anywords between /*… */ are commentsonly and are notprocessed byTeradata;

Names anddescribes thepurpose of thescript; names theauthor.

/* Setup Logtable, Logon Statements*/


.LOGON CDW/SQL01,SQL01;

Sets Up a Logtableand then logs onto Teradata.

/* Begin Import and Define Work and Error Tables */

.BEGIN IMPORT MLOAD TABLES SQL01.Student_Profile

WORKTABLES SQL01.SWA_WTERRORTABLES SQL01.SWA_ET

SQL01.SWA_UV;

Begins the LoadProcess by tellingus first the namesof the target table,work table anderror tables.

/* Define Layout of Input File */Names theLAYOUT of theINPUT file;



.LAYOUT FILEIN;.FIELD Student_ID * INTEGER;.FIELD Last_Name * CHAR (20);.FIELD First_Name * VARCHAR (12);.FIELD Class_Code * CHAR (2);.FIELD Grade_Pt * DECIMAL(5,2);

An ALLCHARACTERbased flat file.

Defines thestructure of theINPUT file; Noticethe dots before theFIELD commandand thesemi-colons aftereach FIELDdefinition;

/* Begin INSERT and UPDATE Process on Table */

.DML LABEL UPSERTER

Names the DMLLabel

DO INSERT FOR MISSING UPDATE ROWS;/* Without the above DO, one of these is guaranteed tofail on this same table. If the UPDATE fails becauserows is missing, it corrects by doing the INSERT */

UPDATE SQL01.Student_ProfileSET Last_Name = :Last_Name ,First_Name = :First_Name ,Class_Code = :Class_Code ,Grade_Pt = :Grade_PtWHERE Student_ID = :Student_ID;

INSERT INTO SQL01.Student_ProfileVALUES (:Student_ID

,:Last_Name,:First_Name,:Class_Code,:Grade_Pt);

Tells MultiLoad toINSERT a row ifthere is not one tobe UPDATED, i.e.,UPSERT.

Defines theUPDATE.

Qualifies theUPDATE.

Defines theINSERT.

We recommendplacing commaseparators in frontof the followingcolumn or valuefor easierdebugging.

.IMPORT INFILE CDW_IMPORT.DATLAYOUT FILEINAPPLY UPSERTER;

Names the ImportFile and it namesthe Layout file touse and tellsMultiLoad toAPPLY theUPSERTs.

.END MLOAD;

.LOGOFF;

Ends MultiLoadand logs off ofTeradata

Figure 5-16

What Happens when MultiLoad FinishesMultiLoad Statistics



****08:06:41 UTY1803 Import Processing Statistics

Import 1 Total Thus FarCandidate Records considered . . . 70000 70000Apply conditions satisfied . . . . 70000 70000

****08:06:38 UTY0818 Statistics for table Employee_TableINSERTS: 25000UPDATES: 25000DELETES: 0

****08:06:41 UTY0818 Statistics for table Department_TableINSERTS: 0UPDATES: 20000

DELETES: 0

Figure 5-17

Troubleshooting Multiload ErrorsThe output statistics in the above example indicate that the load was entirely successful. But that isnot always the case. Now we need to troubleshoot in order identify the errors and correct them, ifdesired. Earlier on, we noted that MultiLoad generates two error tables, the Acquisition Error andthe Application error table. You may select from these tables to discover the problem and researchthe issues.

For the most part, the Acquisition error table logs errors that occur during that processing phase.The Application error table lists Unique Primary Index violations, field overflow errors on non-PIcolumns, and constraint errors that occur in the APPLY phase. MultiLoad error tables not only listthe errors they encounter, they also have the capability to STORE those errors. Do you rememberthe MARK and IGNORE parameters? This is where they come into play. MARK will ensure that theerror rows, along with some details about the errors are stored in the error table. IGNORE doesneither; it is as if the error never occurred.

THREE COLUMNS SPECIFIC TO THE ACQUISITION ERROR TABLEErrorCode System code that identifies the error.ErrorField Name of the column in the target table where the error happened; is left blank if

the offending column cannot be identified.HostData The data row that contains the error.

Figure 5-19



THREE COLUMNS SPECIFIC TO THE APPLICATION ERROR TABLEUniqueness Contains a certain value that disallows duplicate row errors in this table; can be

ignored, if desired.DBCErrorCode System code that identifies the error.DBCErrorField Name of the column in the target table where the error happened; is left blank if the

offending column cannot be identified. NOTE: A copy of the target table columnimmediately follows this column.

Figure 5-20

RESTARTing MultiloadWho hasn't experienced a failure at some time when attempting a load? Don't take it personally!Failures can and do occur on the host or Teradata (DBC) for many reasons. MultiLoad has theimpressive ability to RESTART from failures in either environment. In fact, it requires almost noeffort to continue or resubmit the load job. Here are the factors that determine how it works:

First, MultiLoad will check the Restart Logtable and automatically resume the load process from thelast successful CHECKPOINT before the failure occurred. Remember, the Logtable is essential forrestarts. MultiLoad uses neither the Transient Journal nor rollbacks during a failure. That is why youmust designate a Logtable at the beginning of your script. MultiLoad either restarts by itself or waitsfor the user to resubmit the job. Then MultiLoad takes over right where it left off.

Second, suppose Teradata experiences a reset while MultiLoad is running. In this case, the hostprogram will restart MultiLoad after Teradata is back up and running. You do not have to do a thing!

Third, if a host mainframe or network client fails during a MultiLoad, or the job is aborted, you maysimply resubmit the script without changing a thing. MultiLoad will find out where it stopped and startagain from that very spot.

Fourth, if MultiLoad halts during the Application Phase it must be resubmitted and allowed to rununtil complete.

Fifth, during the Acquisition Phase the CHECKPOINT (n) you stipulated in the .BEGIN MLOADclause will be enacted. The results are stored in the Logtable. During the Application Phase,CHECKPOINTs are logged each time a data block is successfully written to its target table.

HINT: The default number for CHECKPOINT is 15 minutes, but if you specify the CHECKPOINT as60 or less, minutes are assumed. If you specify the checkpoint at 61 or above, the number ofrecords is assumed.

RELEASE MLOAD: When You DON'T Want to Restart MultiLoadWhat if a failure occurs but you do not want to RESTART MultiLoad? Since MultiLoad has alreadyupdated the table headers, it assumes that it still "owns" them. Therefore, it limits access to thetable(s). So what is a user to do? Well there is good news and bad news. The good news is that ifthe job you may use the RELEASE MLOAD command to release the locks and rollback the job. Thebad news is that if you have been loading multiple millions of rows, the rollback may take a lot oftime. For this reason, most customers would rather just go ahead and RESTART.

Before V2R3: In the earlier days of Teradata it was NOT possible to use RELEASE MLOAD if oneof the following three conditions was true:

In IMPORT mode, once MultiLoad had reached the end of the Acquisition Phase you couldnot use RELEASE MLOAD. This is sometimes referred to as the "point of no return."

•



In DELETE mode, the point of no return was when Teradata received the DELETEstatement.

•

If the job halted in the Apply Phase, you will have to RESTART the job.•

With and since V2R3: The advent of V2R3 brought new possibilities with regard to using theRELEASE MLOAD command. It can NOW be used in the APPLY Phase, if:

You are running a Teradata V2R3 or later version•

You use the correct syntax:RELEASE MLOAD <target-table> IN APPLY

•

The load script has NOT been modified in any way•

The target tables either:Must be empty, or♦

Must have no Fallback, no NUSIs, no Permanent Journals♦

•

You should be very cautious using the RELEASE command. It could potentially leave your table halfupdated. Therefore, it is handy for a test environment, but please don't become too reliant on it forproduction runs. They should be allowed to finish to guarantee data integrity.

MultiLoad and INMODsINMODs, or Input Modules, may be called by MultiLoad in either mainframe or LAN environments,providing the appropriate programming languages are used. INMODs are user written routineswhose purpose is to read data from one or more sources and then convey it to a load utility, hereMultiLoad, for loading into Teradata. They allow MultiLoad to focus solely on loading data by doingdata validation or data conversion before the data is ever touched by MultiLoad. INMODs replacethe normal MVS DDNAME or LAN file name with the following statement:

.IMPORT INMOD=<INMOD-name>

You will find a more detailed discussion on how to write INMODs for MultiLoad in the chapter of thisbook titled, "INMOD Processing".

How Multiload Compares with FastLoad




Function FastLoad MultiLoadError Tables must be defined Yes Optional. 2 Error Tables have to

exist for each target table andwill automatically be assigned.

Work Tables must be defined No Optional. 1 Work Table has toexist for each target table andwill automatically be assigned.

Logtable must be defined No YesAllows Referential Integrity No NoAllows Unique Secondary Indexes No NoAllows Non-Unique Secondary Indexes No YesAllows Triggers No NoLoads a maximum of n number of tables One FiveDML Statements Supported INSERT INSERT, UPDATE, DELETE,

and "UPSERT"DDL Statements Supported CREATE and DROP

TABLEDROP TABLE

Transfers data in 64K blocks Yes YesNumber of Phases Two FiveIs RESTARTable Yes Yes, in all 5 phases (auto

CHECKPOINT)Stores UPI Violation Rows Yes YesAllows use of Aggregated, Arithmeticcalculations or Conditional Exponentiation

No Yes

Allows Data Conversion Yes, 1 per column YesNULLIF function Yes Yes

Figure 5-21



Documents

Teradata Utilities MultiLoad