Bteq Fexp Fload Mload
Bteq Fexp Fload Mload
There are other utilities that are faster than BTEQ for importing or exporting data. We will talk
about these in future chapters, but BTEQ is still used for smaller jobs.
Logging on to BTEQ
Before you can use BTEQ, you must have user access rights to the client system and privileges
to the Teradata DBS. Normal system access privileges include a userid and a password. Some
systems may also require additional user identification codes depending on company standards
and operational procedures. Depending on the configuration of your Teradata DBS, you may
need to include an account identifier (acctid) and/or a Teradata Director Program Identifier
(TDPID).
BTEQ execution
The following example shows how to invoke BTEQ from a DOS command. In order for this to
work, the directory called Program Files\NCR\Teradata Client\bin must be established in the
search path.
Notice that the BTEQ command is immediately followed by the ‘<BatchScript.txt' to tell BTEQ
which file contains the commands to execute. Then, the ‘>Output.txt' names the file where the
output messages are written. Here is an example of the contents of BatchScript.txt file.
BatchScript.txt File
The above illustration shows how BTEQ can be manually invoked from a command prompt and
displays how to specify the name and location of the batch script file to be executed.
The previous examples show that when logging onto BTEQ in interactive mode, the user actually
types in a logon string and then Teradata will prompt for a password. However, in batch mode,
Teradata requires both a logon and password to be directly stored as part of the script.
Since putting this sensitive information into a script is scary for security reasons, inserting the
password directly into a script that is to be processed in batch mode may not be a good idea. It is
generally recommended and a common practice to store the logon and password in a separate
file that that can be secured. That way, it is not in the script for anyone to see.
Then, the script should contain the following command instead of a .LOGON, as shown below
and again in the following script: .RUN FILE=mylogon.txt
This command opens and reads the file. It then executes every record in the file.
Field Mode (also called REPORT mode): This is set by .EXPORT REPORT. This is the default
mode for BTEQ and brings the data back as if it was a standard SQL SELECT statement. The
output of this BTEQ export would return the column headers for the fields, white space, expanded
packed or binary data (for humans to read) and can be understood using a text editor.
Indicator Mode: This is set by .EXPORT INDICDATA. This mode writes the data in data mode,
but also provides host operating systems with the means of recognizing missing or unknown data
(NULL) fields. This is important if the data is to be loaded into another Relational Database
System (RDBMS).
The issue is that there is no standard character defined to represent either a numeric or character
NULL. So, every system uses a zero for a numeric NULL and a space or blank for a character
NULL. If this data is simply loaded into another RDBMS, it is no longer a NULL, but a zero or
space.
To remedy this situation, INDICATA puts a bitmap at the front of every record written to the disk.
This bitmap contains one bit per field/column. When a Teradata column contains a NULL, the bit
for that field is turned on by setting it to a "1". Likewise, if the data is not NULL, the bit remains a
zero. Therefore, the loading utility reads these bits as indicators of NULL data and identifies the
column(s) as NULL when data is loaded back into the table, where appropriate.
Since both DATA and INDICDATA store each column on disk in native format with known lengths
and characteristics, they are the fastest method of transferring data. However, it becomes
imperative that you be consistent. When it is exported as DATA, it must be imported as DATA and
the same is true for INDICDATA.
Again, this internal processing is automatic and potentially important. Yet, on a network-attached
system, being consistent is our only responsibility. However, on a mainframe system, you must
account for these bits when defining the LRECL in the Job Control Language (JCL). Otherwise,
your length is too short and the job will end with an error.
To determine the correct length, the following information is important. As mentioned earlier, one
bit is needed per field output onto disk. However, computers allocate data in bytes, not bits.
Therefore, if one bit is needed a minimum of eight (8 bits per byte) are allocated. Therefore, for
every eight fields, the LRECL becomes 1 byte longer and must be added. In other words, for nine
columns selected, 2 bytes are added even though only nine bits are needed.
With this being stated, there is one indicator bit per field selected. INDICDATA mode gives the
Host computer the ability to allocate bits in the form of a byte. Therefore, if one bit is required by
the host system, INDICDATA mode will automatically allocate eight of them. This means that from
one to eight columns being referenced in the SELECT will add one byte to the length of the
record. When selecting nine to sixteen columns, the output record will be two bytes longer.
DIF Mode: Known as Data Interchange Format, which allows users to export data from Teradata
to be directly utilized for spreadsheet applications like Excel, FoxPro and Lotus.
The optional limit is to tell BTEQ to stop returning rows after a specific number (n) of rows. This
might be handy in a test environment to stop BTEQ before the end of transferring rows to the file.
BTEQ EXPORT Example Using Record (DATA) Mode
The following is an example that displays how to utilize the export Record (DATA) option. Notice
the periods (.) at the beginning some of script lines. A period starting a line indicates a BTEQ
command. If there is no period, then the command is an SQL command.
When doing an export on a Mainframe or a network-attached (e.g., LAN) computer, there is one
primary difference in the .EXPORT command. The difference is the following:
Mainframe syntax: .EXPORT DATA DDNAME = data definition state name (JCL)
LAN syntax: .EXPORT DATA FILE = actual file name
The following example uses a Record (DATA) Mode format. The output of the exported data will
be a flat file.
Employee_Table
After this script has completed, the following report will be generated on disk.
Employee_No Last_name First_name Salary Dept_No
I remember when my mom and dad purchased my first Lego set. I was so excited about building
my first space station that I ripped the box open, and proceeded to follow the instructions to
complete the station. However, when I was done, I was not satisfied with the design and decided
to make changes. So I built another space ship and constructed another launching station. BTEQ
export works in the same manner, as the basic EXPORT knowledge is acquired, the more we can
build on that foundation.
With that being said, the following is an example that displays a more robust example of utilizing
the Field (Report) option. This example will export data in Field (Report) Mode format. The
output of the exported data will appear like a standard output of a SQL SELECT statement. In
addition, aliases and a title have been added to the script.
After this script has been completed, the following report will be generated on disk.
Employee Profiles
Employee Number Last Name First Name Salary Department Number
From the above example, a number of BTEQ commands were added to the export script. Below
is a review of those commands.
The WIDTH specifies the width of screen displays and printed reports, based on
characters per line.
The FORMAT command allows the ability to enable/inhibit the page-oriented format
option.
The HEADING command specifies a header that will appear at the top every page of a
report.
BTEQ IMPORT Example
BTEQ can also read a file from the hard disk and incorporate the data into SQL to modify the
contents of one or more tables. In order to do this processing, the name and record description of
the file must be known ahead of time. These will be defined within the script file.
The script below introduces the IMPORT command with the Record (DATA) option. Notice the
periods (.) at the beginning some of script lines. A period starting a line indicates a BTEQ
command. If there is no period, then the command is an SQL command.
The SKIP option is used when you wish to bypass the first records in a file. For example, a
mainframe tape may have header records that should not be processed. Other times, maybe the
job started and loaded a few rows into the table with a UPI defined. Loading them again will
cause an error. So, you can skip over them using this option.
The following example will use a Record (DATA) Mode format. The input of the imported data will
populate the Employee_Table.
From the above example, a number of BTEQ commands were added to the import script. Below
is a review of those commands.
.QUIET ON limits BTEQ output to reporting only errors and request processing statistics.
Note: Be careful how you spell .QUIET, else forgetting the E becomes .QUIT and it will.
.REPEAT * causes BTEQ to read a specified number of records or until EOF. The default
is one record. Using REPEAT 10 would perform the loop 10 times.
The USING defines the input data fields and their associated data types coming from the
host.
The following builds upon the IMPORT Record (DATA) example above. The example below will
still utilize the Record (DATA) Mode format. However, this script will add a CREATE TABLE
statement. In addition, the imported data will populate the newly created Employee_Profile table.
of the scripts have a .LOGOFF and .QUIT. The .LOGOFF is optional because when BTEQ quits,
the session is terminated. A logoff makes it a friendly departure and also allows you to logon with
a different user name and password.
Determining Out Record Lengths
Some hosts, such as IBM mainframes, require the correct LRECL (Logical Record Length)
parameter in the JCL, and will abort if the value is incorrect. The following page will discuss how
to figure out the record lengths.
There are three issues involving record lengths and they are:
Fixed columns
Variable columns
NULL indicators
Fixed Length Columns: For fixed length columns you merely count the length of the column. The
lengths are:
INTEGER 4 bytes
SMALLINT 2 bytes
BYTEINT 1 byte
CHAR(10) 10 bytes
CHAR(4) 4 bytes
DATE 4 bytes
DECIMAL(7,2) 4 bytes (packed data, total digits / 2 +1)
DECIMAL(12,2) 8 bytes
Variable columns: Variable length columns should be calculated as the maximum value plus two.
This two bytes is for the number of bytes for the binary length of the field. In reality you can save
much space because trailing blanks are not kept. The logical record will assume the maximum
and add two bytes as a length field per column.
VARCHAR(8) 10 bytes
VARCHAR(10) 12 bytes
Indicator columns: As explained earlier, the indicators utilize a single bit for each field. If your
record has 8 fields (which require 8 bits), then you add one extra byte to the total length of all the
fields. If your record has 9-16 fields, then add two bytes.
You can over-ride the standard error codes at the time you terminate BTEQ. This might be handy
for debug purposes. The error code or "return code" can be any number you specify using one of
the following:
BTEQ Commands
The BTEQ commands in Teradata are designed for flexibility. These commands are not used
directly on the data inside the tables. However, these 60 different BTEQ commands are utilized in
four areas.
Session Control Commands
File Control Commands
Sequence Control Commands
Format Control Commands
Session Control Commands
In this book, we have already discovered how BTEQ can be utilized to export data from Teradata
in a variety of formats. As the demand increases to store data, the ever-growing requirement for
tools to export massive amounts of data.
This is the reason why FastExport (FEXP) is brilliant by design. A good rule of thumb is that if you
have more than half a million rows of data to export to either a flat file format or with NULL
indicators, then FastExport is the best choice to accomplish this task.
Keep in mind that FastExport is designed as a one-way utility-that is, the sole purpose of
FastExport is to move data out of Teradata. It does this by harnessing the parallelism that
Teradata provides.
FastExport is extremely attractive for exporting data because it takes full advantage of multiple
sessions, which leverages Teradata parallelism. FastExport can also export from multiple tables
during a single operation. In addition, FastExport utilizes the Support Environment, which
provides a job restart capability from a checkpoint if an error occurs during the process of
executing an export job.
If the output data is sorted, FastExport may be required to redistribute the selected data two times
across the AMP processors in order to build the blocks in the correct sequence. Remember, a lot
of rows fit into a 64K block and both the rows and the blocks must be sequenced. While all of this
redistribution is occurring, BTEQ continues to send rows. FastExport is getting behind in the
processing. However, when FastExport starts sending the rows back a block at a time, it quickly
overtakes and passes BTEQ's row at time processing.
The other advantage is that if BTEQ terminates abnormally, all of your rows (which are in
SPOOL) are discarded. You must rerun the BTEQ script from the beginning. However, if
FastExport terminates abnormally, all the selected rows are in worktables and it can continue
sending them where it left off. Pretty smart and very fast!
Also, if there is a requirement to manipulate the data before storing it on the computer's hard
drive, an OUTMOD routine can be written to modify the result set after it is sent back to the client
on either the mainframe or LAN. Just like the BASF commercial states, "We don't make the
products you buy, we make the products you buy better". FastExport is designed off the same
premise, it does not make the SQL SELECT statement faster, but it does take the SQL SELECT
statement and processes the request with lighting fast parallel processing!
FastExport Fundamentals
#1: FastExport EXPORTS data from Teradata. The reason they call it FastExport is because it
takes data off of Teradata (Exports Data). FastExport does not import data into Teradata.
Additionally, like BTEQ it can output multiple files in a single run.
#2: FastExport only supports the SELECT statement. The only DML statement that
FastExport understands is SELECT. You SELECT the data you want exported and FastExport will
take care of the rest.
#3: Choose FastExport over BTEQ when Exporting Data of more than half a million+ rows.
When a large amount of data is being exported, FastExport is recommended over BTEQ Export.
The only drawback is the total number of FastLoads, FastExports, and MultiLoads that can run at
the same time, which is limited to 15. BTEQ Export does not have this restriction. Of course,
FastExport will work with less data, but the speed may not be much faster than BTEQ.
#4: FastExport supports multiple SELECT statements and multiple tables in a single run.
You can have multiple SELECT statements with FastExport and each SELECT can join
information up to 64 tables.
#6: FastExport does NOT support error files or error limits. FastExport does not record
particular error types in a table. The FastExport utility will terminate after a certain number of
errors have been encountered.
#7: FastExport supports user-written routines INMODs and OUTMODs. FastExport allows
you write INMOD and OUTMOD routines so you can select, validate and preprocess the exported
data
The reason for this limitation is that FastLoad, MultiLoad, and FastExport all use large blocks to
transfer data. If more then 15 simultaneous jobs were supported, a saturation point could be
reached on the availability of resources. In this case, Teradata does an excellent job of protecting
system resources by queuing up additional FastLoad, MultiLoad, and FastExport jobs that are
attempting to connect.
For example, if the maximum numbers of utilities on the Teradata system is reached and another
job attempts to run that job does not start. This limitation should be viewed as a safety control
feature. A tip for remembering how the load limit applies is this, "If the name of the load utility
contains either the word 'Fast' or the word 'Load', then there can be only a total of fifteen of them
running at any one time".
BTEQ does not have this load limitation. FastExport is clearly the better choice when exporting
data. However, if two many load jobs are running. BTEQ is an alternate choice for exporting data.
FastExport Support and Task Commands
FastExport accepts both FastExport commands and a subset of SQL statements. The FastExport
commands can be broken down into support and task activities. The table below highlights the
key FastExport commands and their definitions. These commands provide flexibility and control
during the export process.
Support Environment Commands (see Support Environment chapter
for details)
Task Commands
FastExport Supported SQL Commands
FastExport accepts the following Teradata SQL statements. Each has been placed in alphabetic
order for your convenience.
SQL Commands
FastExport Modes
FastExport has two modes: RECORD or INDICATOR. In the mainframe world, only use RECORD
mode. In the UNIX or LAN environment, RECORD mode is the default, but you can use
INDICATOR mode if desired. The difference between the two modes is INDICATOR mode will set
the indicator bits to 1 for column values containing NULLS.
Both modes return data in a client internal format with variable-length records. Each individual
record has a value for all of the columns specified by the SELECT statement. All variable-length
columns are preceded by a two-byte control value indicating the length of the column data. NULL
columns have a value that is appropriate for the column data type. Remember, INDICATOR mode
will set bit flags that identify the columns that have a null value.
FastExport Formats
FastExport has many possible formats in the UNIX or LAN environment. The FORMAT statement
specifies the format for each record being exported which are:
FASTLOAD
BINARY
TEXT
UNFORMAT
FASTLOAD Format is a two-byte integer, followed by the data, followed by an end-of- record
marker. It is called FASTLOAD because the data is exported in a format ready for FASTLOAD.
The way FastLoad works can be illustrated by home construction, of all things! Let's look at three
scenarios from the construction industry to provide an amazing picture of how the data gets
loaded.
Scenario One: Builders prefer to start with an empty lot and construct a house on it, from the
foundation right on up to the roof. There is no pre-existing construction, just a smooth, graded lot.
The fewer barriers there are to deal with, the quicker the new construction can progress. Building
custom or spec houses this way is the fastest way to build them. Similarly, FastLoad likes to start
with an empty table, like an empty lot, and then populate it with rows of data from another source.
Because the target table is empty, this method is typically the fastest way to load data. FastLoad
will never attempt to insert rows into a table that already holds data.
Scenario Two: The second scenario in this analogy is when someone buys the perfect piece of
land on which to build a home, but the lot already has a house on it. In this case, the person may
determine that it is quicker and more advantageous just to demolish the old house and start fresh
from the ground up-allowing for brand new construction. FastLoad also likes this approach to
loading data. It can just 1) drop the existing table, which deletes the rows, 2) replace its structure,
and then 3) populate it with the latest and greatest data. When dealing with huge volumes of new
rows, this process will run much quicker than using MultiLoad to populate the existing table.
Another option is to DELETE all the data rows from a populated target table and reload it. This
requires less updating of the Data Dictionary than dropping and recreating a table. In either case,
the result is a perfectly empty target table that FastLoad requires!
Scenario Three: Sometimes, a customer has a good house already but wants to remodel a
portion of it or to add an additional room. This kind of work takes more time than the work
described in Scenario One. Such work requires some tearing out of existing construction in order
to build the new section. Besides, the builder never knows what he will encounter beneath the
surface of the existing home. So you can easily see that remodeling or additions can take more
time than new construction. In the same way, existing tables with data may need to be updated
by adding new rows of data. To load populated tables quickly with large amounts of data while
maintaining the data currently held in those tables, you would choose MultiLoad instead of
FastLoad. MultiLoad is designed for this task but, like renovating or adding onto an existing
house, it may take more time.
This is different from BTEQ and TPump, which load data at the row level. It has been said, "If you
have it, flaunt it!" FastLoad does not like to brag, but it takes full advantage of Teradata's parallel
architecture. In fact, FastLoad will create a Teradata session for each AMP (Access Module
Processor-the software processor in Teradata responsible for reading and writing data to the
disks) in order to maximize parallel processing. This advantage is passed along to the FastLoad
user in terms of awesome performance. Teradata is the only data warehouse product in the world
that loads data, processes data and backs up data in parallel.
Rule #1: No Secondary Indexes are allowed on the Target Table. High performance will only
allow FastLoad to utilize Primary Indexes when loading. The reason for this is that Primary (UPI
and NUPI) indexes are used in Teradata to distribute the rows evenly across the AMPs and build
only data rows. A secondary index is stored in a subtable block and many times on a different
AMP from the data row. This would slow FastLoad down and they would have to call it: get ready
now, HalfFastLoad. Therefore, FastLoad does not support them. If Secondary Indexes exist
already, just drop them. You may easily recreate them after completing the load.
Rule #2: No Referential Integrity is allowed. FastLoad cannot load data into tables that are
defined with Referential Integrity (RI). This would require too much system checking to prevent
referential constraints to a different table. FastLoad only does one table. In short, RI constraints
will need to be dropped from the target table prior to the use of FastLoad.
Rule #3: No Triggers are allowed at load time. FastLoad is much too focused on speed to pay
attention to the needs of other tables, which is what Triggers are all about. Additionally, these
require more than one AMP and more than one table. FastLoad does one table only. Simply
ALTER the Triggers to the DISABLED status prior to using FastLoad.
Rule #4: Duplicate Rows (in Multi-Set Tables) are not supported. Multiset tables are tables
that allow duplicate rows-that is when the values in every column are identical. When FastLoad
finds duplicate rows, they are discarded. While FastLoad can load data into a multi-set table,
FastLoad will not load duplicate rows into a multi-set table because FastLoad discards duplicate
rows!
Rule #5: No AMPs may go down (i.e., go offline) while FastLoad is processing. The down
AMP must be repaired before the load process can be restarted. Other than this, FastLoad can
recover from system glitches and perform restarts. We will discuss Restarts later in this chapter.
Rule #6: No more than one data type conversion is allowed per column during a FastLoad.
Why just one? Data type conversion is highly resource intensive job on the system, which
requires a "search and replace" effort. And that takes more time. Enough said!
Log Table: FastLoad needs a place to record information on its progress during a load. It uses
the table called Fastlog in the SYSADMIN database. This table contains one row for every
FastLoad running on the system. In order for your FastLoad to use this table, you need INSERT,
UPDATE and DELETE privileges on that table.
Empty Target Table: We have already mentioned the absolute need for the target table to be
empty. FastLoad does not care how this is accomplished. After an initial load of an empty target
table, you are now looking at a populated table that will likely need to be maintained.
If you require the phenomenal speed of FastLoad, it is usually preferable, both for the sake of
speed and for less interaction with the Data Dictionary, just to delete all the rows from that table
and then reload it with fresh data. The syntax DELETE <databasename>.<tablename> should be
used for this. But sometimes, as in some of our FastLoad sample scripts below
you want to drop that table and recreate it versus using the DELETE option. To do this, FastLoad
has the ability to run the DDL statements DROP TABLE and CREATE TABLE. The problem with
putting DDL in the script is that is no longer restartable and you are required to rerun the
FastLoad from the beginning. Otherwise, we recommend that you have a script for an initial run
and a different script for a restart.
Two Error Tables: Each FastLoad requires two error tables. These are error tables that will only
be populated should errors occur during the load process. These are required by the FastLoad
utility, which will automatically create them for you; all you must do is to name them. The first error
table is for any translation errors or constraint violations. For example, a row with a column
containing a wrong data type would be reported to the first error table. The second error table is
for errors caused by duplicate values for Unique Primary Indexes (UPI). FastLoad will load just
one occurrence for every UPI. The other occurrences will be stored in this table. However, if the
entire row is a duplicate, FastLoad counts it but does not store the row. These tables may be
analyzed later for troubleshooting should errors occur during the load. For specifics on how you
can troubleshoot, see the section below titled, "What Happens When FastLoad Finishes."
Maximum of 15 Loads
The Teradata RDBMS will only run a maximum number of fifteen FastLoads, MultiLoads, or
FastExports at the same time. This maximum is determined by a value stored in the DBS Control
record. It can be any value from 0 to 15. When Teradata is first installed, this value is set to 5
concurrent jobs.
Since these utilities all use the large blocking of rows, it hits a saturation point where Teradata will
protect the amount system resources available by queuing up the extra load. For example, if the
maximum number of jobs are currently running on the system and you attempt to run one more,
that job will not be started. You should view this limit as a safety control. Here is a tip for
remembering how the load limit applies: If the name of the load utility contains either the word
"Fast" or the word "Load", then there can be only a total of fifteen of them running at any one
time.
PHASE 1: Acquisition
The primary function of Phase 1 is to transfer data from the host computer to the Access Module
Processors (AMPs) as quickly as possible. For the sake of speed, the Parsing Engine of Teradata
does not does not take the time to hash each row of data based on the Primary Index. That will
be done later. Instead, it does the following:
When the Parsing Engine (PE) receives the INSERT command, it uses one session to parse the
SQL just once. The PE is the Teradata software processor responsible for parsing syntax and
generating a plan to execute the request. It then opens a Teradata session from the FastLoad
client directly to the AMPs. By default, one session is created for each AMP. Therefore, on large
systems, it is normally a good idea to limit the number of sessions using the SESSIONS
command. This capability is shown below.
Simultaneously, all but one of the client sessions begins loading raw data in 64K blocks for
transfer to an AMP. The first priority of Phase 1 is to get the data onto the AMPs as fast as
possible. To accomplish this, the rows are packed, unhashed, into large blocks and sent to the
AMPs without any concern for which AMP gets the block. The result is that data rows arrive on
different AMPs than those they would live, had they been hashed.
So how do the rows get to the correct AMPs where they will permanently reside? Following the
receipt of every data block, each AMP hashes its rows based on the Primary Index, and
redistributes them to the proper AMP. At this point, the rows are written to a worktable on the AMP
but remain unsorted until Phase 1 is complete.
Phase 1 can be compared loosely to the preferred method of transfer used in the parcel shipping
industry today. How do the key players in this industry handle a parcel? When the shipping
company receives a parcel, that parcel is not immediately sent to its final destination. Instead, for
the sake of speed, it is often sent to a shipping hub in a seemingly unrelated city. Then, from that
hub it is sent to the destination city. FastLoad's Phase 1 uses the AMPs in much the same way
that the shipper uses its hubs. First, all the data blocks in the load get rushed randomly to any
AMP. This just gets them to a "hub" somewhere in Teradata country. Second, each AMP forwards
them to their true destination. This is like the shipping parcel being sent from a hub city to its
destination city!
PHASE 2: Application
Following the scenario described above, the shipping vendor must do more than get a parcel to
the destination city. Once the packages arrive at the destination city, they must then be sorted by
street and zip code, placed onto local trucks and be driven to their final, local destinations.
Similarly, FastLoad's Phase 2 is mission critical for getting every row of data to its final address
(i.e., where it will be stored on disk). In this phase, each AMP sorts the rows in its worktable. Then
it writes the rows into the table space on disks where they will permanently reside. Rows of a
table are stored on the disks in data blocks. The AMP uses the block size as defined when the
target table was created. If the table is Fallback protected, then the Fallback will be loaded after
the Primary table has finished loading. This enables the Primary table to become accessible as
soon as possible. FastLoad is so ingenious, no wonder it is the darling of the Teradata load
utilities!
FastLoad Commands
Here is a table of some key FastLoad commands and their definitions. They are used to provide
flexibility in control of the load process. Consider this your personal redireference guide! You will
notice that there are only a few SQL commands that may be used with this utility (Create Table,
Drop Table, Delete and Insert). This keeps FastLoad from becoming encumbered with additional
functions that would slow it down
A FastLoad Example in its Simplest Form
The load utilities often scare people because there are many things that appear complicated. In
actuality, the load scripts are very simple. Think of FastLoad as:
Logging onto Teradata
Defining the Teradata table that you want to load (target table)
Defining the INPUT data file
Telling the system to start loading
Telling the system to start loading
This first script example is designed to show FastLoad in its simplest form. The actual script is in
the left column and our comments are on the right.
Sample FastLoad Script
Let's look at an actual FastLoad script that you might see in the real world. In the script below,
every comment line is placed inside the normal Teradata comment syntax, [/*….*/]. FastLoad and
SQL commands are written in upper case in order to make them stand out. In reality, Teradata
utilities, like Teradata itself, are by default not case sensitive. You will also note that when column
names are listed vertically we recommend placing the comma separator in front of the following
column. Coding this way makes reading or debugging the script easier for everyone. The purpose
of this script is to update the Employee_Profile table in the SQL01 database. The input file used
for the load is named EMPS.TXT. Below the sample script each step will be described in detail.
Normally it is not a good idea to put the DROP and CREATE statements in a FastLoad script. The
reason is that when any of the tables that FastLoad is using are dropped, the script cannot be
restarted. It can only be rerun from the beginning. Since FastLoad has restart logic built into it, a
restart is normally the better solution if the initial load attempt should fail. However, for purposes
of this example, it shows the table structure and the description of the data being read.
Step One: Before logging onto Teradata, it is important to specify how many sessions
you need. The syntax is [SESSIONS {n}].
Step Two: Next, you LOGON to the Teradata system. You will quickly see that the utility
commands in FastLoad are similar to those in BTEQ. FastLoad commands were designed
from the underlying commands in BTEQ. However, unlike BTEQ, most of the FastLoad
commands do not allow a dot ["."] in front of them and therefore need a semicolon. At this
point we chose to have Teradata tell us which version of FastLoad is being used for the load.
Why would we recommend this? We do because as FastLoad's capabilities get enhanced
with newer versions, the syntax of the scripts may have to be revisited.
Step Three: If the input file is not a FastLoad format, before you describe the INPUT
FILE structure in the DEFINE statement, you must first set the RECORD layout type for the
file being passed by FastLoad. We have used VARTEXT in our example with a comma
delimiter. The other options are FastLoad, TEXT, UNFORMATTED OR VARTEXT. You need
to know this about your input file ahead of time.
Step Four: Next, comes the DEFINE statement. FastLoad must know the structure and
the name of the flat file to be used as the input FILE, or source file for the load.
Step Five: FastLoad makes no assumptions from the DROP TABLE statements with
regard to what you want loaded. In the BEGIN LOADING statement, the script must name
the target table and the two error tables for the load. Did you notice that there is no CREATE
TABLE statement for the error tables in this script? FastLoad will automatically create them
for you once you name them in the script. In this instance, they are named "Emp_Err1" and
"Emp_Err2". Phase 1 uses "Emp_Err1" because it comes first and Phase 2 uses
"Emp_Err2". The names are arbitrary, of course. You may call them whatever you like. At the
same time, they must be unique within a database, so using a combination of your userid
and target table name helps insure this uniqueness between multiple FastLoad jobs
occurring in the same database.
In the BEGIN LOADING statement we have also included the optional CHECKPOINT
parameter. We included [CHECKPOINT 100000]. Although not required, this optional
parameter performs a vital task with regard to the load. In the old days, children were always
told to focus on the three 'R's' in grade school ("reading, ‘riting, and ‘rithmatic"). There are
two very different, yet equally important, R's to consider whenever you run FastLoad. They
are RERUN and RESTART. RERUN means that the job is capable of running all the
processing again from the beginning of the load. RESTART means that the job is capable of
running the processing again from the point where it left off when the job was interrupted,
causing it to fail. When CHECKPOINT is requested, it allows FastLoad to resume loading
from the first row following the last successful CHECKPOINT. We will learn more about
CHECKPOINT in the section on Restarting FastLoad.
Step Six: FastLoad focuses on its task of loading data blocks to AMPs like little Yorkshire
terrier's do when playing with a ball! It will not stop unless you tell it to stop. Therefore, it will
not proceed to Phase 2 without the END LOADING command.
In reality, this provides a very valuable capability for FastLoad. Since the table must be
empty at the start of the job, it prevents loading rows as they arrive from different time zones.
However, to accomplish this processing, simply omit the END LOADING on the load job.
Then, you can run the same FastLoad multiple times and continue loading the worktables
until the last file is received. Then run the last FastLoad job with an END LOADING and you
have partitioned your load jobs into smaller segments instead of one huge job. This makes
FastLoad even faster!
Of course to make this work, FastLoad must be restartable. Therefore, you cannot use the
DROP or CREATE commands within the script. Additionally, every script is exactly the same
with the exception of the last one, which contains the END LOADING causing FastLoad to
proceed to Phase 2. That's a pretty clever way to do a partitioned type of data load.
Step Seven: All that goes up must come down. And all the sessions must LOGOFF. This
will be the last utility command in your script. At this point the table lock is released and if
there are no rows in the error tables, they are dropped automatically. However, if a single row
is in one of them, you are responsible to check it, take the appropriate action and drop the
table manually.
Converting Data Types with FastLoad
Converting data is easy. Just define the input data types in the input file. Then, FastLoad will
compare that to the column definitions in the Data Dictionary and convert the data for you! But the
cardinal rule is that only one data type conversion is allowed per column. In the example below,
notice how the columns in the input file are converted from one data type to another simply by
redefining the data type in the CREATE TABLE statement.
FastLoad allows six kinds of data conversions. Here is a chart that displays them:
When we said that converting data is easy, we meant that it is easy for the user. It is actually quite
resource intensive, thus increasing the amount of time needed for the load. Therefore, if speed is
important, keep the number of columns being converted to a minimum!
A FastLoad Conversion Example
This next script example is designed to show how FastLoad converts data automatically when the
INPUT data type differs from the Target Teradata Table data type. The actual script is in the left
column and our comments are on the right.
Can you tell from the following sample fastLoad script why it is not restartable?
Why might you have to RESTART a FastLoad job, anyway? Perhaps you might experience a
system reset or some glitch that stops the job one half way through it. Maybe the mainframe went
down. Well, it is not really a big deal because FastLoad is so lightning-fast that you could
probably just RERUN the job for small data loads.
However, when you are loading a billion rows, this is not a good idea because it wastes time. So
the most common way to deal with these situations is simply to RESTART the job. But what if the
normal load takes 4 hours, and the glitch occurs when you already have two thirds of the data
rows loaded? In that case, you might want to make sure that the job is totally restartable. Let's
see how this is done.
When You Can RESTART FastLoad
If all of the following conditions are true, then FastLoad is ALWAYS restartable:
The Error Tables are NOT DROPPED in the script
The Target Table is NOT DROPPED in the script
The Target Table is NOT CREATED in the script
You have defined a checkpoint
So, if you need to drop or create tables, do it in a separate job using BTEQ. Imagine that you
have a table whose data changes so much that you typically drop it monthly and build it again.
Let's go back to the script we just reviewed above and see how we can break it into the two parts
necessary to make it fully RESTARTABLE. It is broken up below.
STEP ONE: Run the following SQL statements in Queryman or BTEQ before you start
FastLoad:
First, you ensure that the target table and error tables, if they existed previously, are blown away.
If there had been no errors in the error tables, they would be automatically dropped. If these
tables did not exist, you have not lost anything. Next, if needed, you create the empty table
structure needed to receive a FastLoad.
If these are true, all you need do is resubmit the FastLoad job and it starts loading data again with
the next record after the last checkpoint. Now, with that said, if you did not request a checkpoint,
the output message will normally indicate how many records were loaded.
You may optionally use the RECORD command to manually restart on the next record after the
one indicated in the message.
Now, if the FastLoad job aborts in Phase 2, you can simply submit a script with only the BEGIN
LOADING and END LOADING. It will then restart right into Phase 2.
What Happens When FastLoad Finishes
The locks will not be removed and the error tables will not be dropped without a successful
completion. This is because FastLoad assumes that it will need them for its restart. At the same
time, the lock on the target table will not be released either. When running FastLoad, you
realistically have two choices once it is started. First choice is that you get it to run to a successful
completion, or lastly, rerun it from the beginning. As you can imagine, the best course of action is
normally to get it to finish successfully via a restart.
Note on duplicate rows: Whenever FastLoad experiences a restart, there will normally be
duplicate rows that are counted. This is due to the fact that a error seldom occurs on a checkpoint
(quiet or quiescent point) when nothing is happening within FastLoad. Therefore, some number of
rows will be sent to the AMPs again because the restart starts on the next record after the value
stored in the checkpoint. Hence, when a restart occurs, the first row after the checkpoint and
some of the consecutive rows are sent a second time. These will be caught as duplicate rows
after the sort. This restart logic is the reason that FastLoad will not load duplicate rows into a
MULTISET table. It assumes they are duplicates because of this logic.
As a user, you can select from either error table. To check errors in Errorfile1 you would use this
syntax:
SELECT DISTINCT ErrorCode, Errorfieldname FROM Errortable1;
Corrected rows may be inserted to the target table using another utility that does not require an
empty table.
The definition of the second error table is exactly the same as the target table with all the same
columns and data types.
Restarting FastLoad: A More In-Depth Look
At each CHECKPOINT, the AMPs will all pause and make sure that everything is loading
smoothly. Then FastLoad sends a checkpoint report (entry) to the SYSADMIN.Fastlog table. This
log contains a list of all currently running FastLoad jobs and the last successfully reached
checkpoint for each job. Should an error occur that requires the load to restart, FastLoad will
merely go back to the last successfully reported checkpoint prior to the error. It will then restart
from the record immediately following that checkpoint and start building the next block of data to
load. If such an error occurs in Phase 1, with CHECKPOINT 0, FastLoad will always restart from
the very first row.
If the interruption occurs in Phase 2, the Data Acquisition phase has already completed. We know
that the error is in the Application Phase. In this case, resubmit the FastLoad script with only the
BEGIN and END LOADING Statements. This will restart in Phase 2 with the sort and building of
the target table.
This feature rich utility can perform multiple types of DML tasks, including INSERT, UPDATE,
DELETE and UPSERT on up to five (5) empty or populated target tables at a time. These DML
functions may be run either solo or in combinations, against one or more tables. For these
reasons, MultiLoad is the utility of choice when it comes to loading populated tables in the batch
environment. As the volume of data being loaded or updated in a single block, the performance of
MultiLoad improves. MultiLoad shines when it can impact more than one row in every data block.
In other words, MultiLoad looks at massive amounts of data and says, "Bring it on!"
Leo Tolstoy once said, "All happy families resemble each other." Like happy families, the Teradata
load utilities resemble each other, although they may have some differences. You are going to be
pleased to find that you do not have to learn all new commands and concepts for each load utility.
MultiLoad has many similarities to FastLoad. It has even more commands in common with
TPump. The similarities will be evident as you work with them. Where there are some quirky
differences, we will point them out for you.
The MultiLoad DELETE mode is used to perform a global (all AMP) delete on just one table.
The reason to use .BEGIN DELETE MLOAD is that it bypasses the Transient Journal (TJ) and
can be RESTARTed if an error causes it to terminate prior to finishing. When performing in
DELETE mode, the DELETE SQL statement cannot reference the Primary Index in the WHERE
clause. This due to the fact that a primary index access is to a specific AMP; this is a global
operation.
The other factor that makes a DELETE mode operation so good is that it examines an entire
block of rows at a time. Once all the eligible rows have been removed, the block is written one
time and a checkpoint is written. So, if a restart is necessary, it simply starts deleting rows from
the next block without a checkpoint. This is a smart way to continue. Remember, when using the
TJ all deleted rows are put back into the table from the TJ as a rollback. A rollback can take
longer to finish then the delete. MultiLoad does not do a rollback; it does a restart.
In the above diagram, monthly data is being stored in a quarterly table. To keep the contents
limited to four months, monthly data is rotated in and out. At the end of every month, the oldest
month of data is removed and the new month is added. The cycle is "add a month, delete a
month, add a month, delete a month." In our illustration, that means that January data must be
deleted to make room for May's data.
Here is a question for you: What if there was another way to accomplish this same goal without
consuming all of these extra resources? To illustrate, let's consider the following scenario:
Suppose you have TableA that contains 12 billion rows. You want to delete a range of rows based
on a date and then load in fresh data to replace these rows. Normally, the process is to perform a
MultiLoad DELETE to DELETE FROM TableA WHERE <date-column> < ‘2002-02-01'. The final
step would be to INSERT the new rows for May using MultiLoad IMPORT.
Amazingly, MultiLoad has full RESTART capability in all of its five phases of operation. Once
again, this demonstrates its tremendous flexibility as a load utility. Is it pure magic? No, but it
almost seems so. MultiLoad makes effective use of two error tables to save different types of
errors and a LOGTABLE that stores built-in checkpoint information for restarting. This is why
MultiLoad does not use the Transient Journal, thus averting time-consuming rollbacks when a job
halts prematurely.
Here is a key difference to note between MultiLoad and FastLoad. Sometimes an AMP (Access
Module Processor) fails and the system administrators say that the AMP is "down" or "offline."
When using FastLoad, you must restart the AMP to restart the job. MultiLoad, however, can
RESTART when an AMP fails, if the table is fallback protected. As the same time, you can use the
AMPCHECK option to make it work like FastLoad if you want.
MultiLoad Imposes Limits
Rule #1: Unique Secondary Indexes are not supported on a Target Table. Like FastLoad,
MultiLoad does not support Unique Secondary Indexes (USIs). But unlike FastLoad, it does
support the use of Non-Unique Secondary Indexes (NUSIs) because the index subtable row is on
the same AMP as the data row. MultiLoad uses every AMP independently and in parallel. If two
AMPs must communicate, they are not independent. Therefore, a NUSI (same AMP) is fine, but a
USI (different AMP) is not.
Rule #2: Referential Integrity is not supported. MultiLoad will not load data into tables that are
defined with Referential Integrity (RI). Like a USI, this requires the AMPs to communicate with
each other. So, RI constraints must be dropped from the target table prior to using MultiLoad.
Rule #3: Triggers are not supported at load time. Triggers cause actions on related tables
based upon what happens in a target table. Again, this is a multi-AMP operation and to a different
table. To keep MultiLoad running smoothly, disable all Triggers prior to using it.
Rule #4: No concatenation of input files is allowed. MultiLoad does not want you to do this
because it could impact are restart if the files were concatenated in a different sequence or data
was deleted between runs.
Rule #5: The host will not process aggregates, arithmetic functions or exponentiation. If
you need data conversions or math, you might be better off using an INMOD to prepare the data
prior to loading it.
HINT: Sometimes a company wants all of these load support tables to be housed in a particular
database. When these tables are to be stored in any database other than the user's own default
database, then you must give them a qualified name (<databasename>.<tablename>) in the
script or use the DATABASE command to change the current database.
Where will you find these tables in the load script? The Logtable is generally identified
immediately prior to the .LOGON command. Worktables and error tables can be named in the
BEGIN MLOAD statement. Do not underestimate the value of these tables. They are vital to the
operation of MultiLoad. Without them a MultiLoad job can not run. Now that you have had the
"executive summary", let's look at each type of table individually.
Two Error Tables: Here is another place where FastLoad and MultiLoad are similar. Both require
the use of two error tables per target table. MultiLoad will automatically create these tables. Rows
are inserted into these tables only when errors occur during the load process. The first error table
is the acquisition Error Table (ET). It contains all translation and constraint errors that may occur
while the data is being acquired from the source(s).
The second is the Uniqueness Violation (UV) table that stores rows with duplicate values for
Unique Primary Indexes (UPI). Since a UPI must be unique, MultiLoad can only load one
occurrence into a table. Any duplicate value will be stored in the UV error table. For example, you
might see a UPI error that shows a second employee number "99." In this case, if the name for
employee "99" is Kara Morgan, you will be glad that the row did not load since Kara Morgan is
already in the Employee table. However, if the name showed up as David Jackson, then you
know that further investigation is needed, because employee numbers must be unique.
You have the option to name these tables in the MultiLoad script (shown later). Alternatively, if
you do not name them, they default to ET_<target_table_name> and UV_<target_table_name>.
In either case, MultiLoad will not accept error table names that are the same as target table
names. It does not matter what you name them. It is recommended that you standardize on the
naming convention to make it easier for everyone on your team. For more details on how these
error tables can help you, see the subsection in this chapter titled, "Troubleshooting MultiLoad
Errors."
Log Table: MultiLoad requires a LOGTABLE. This table keeps a record of the results from each
phase of the load so that MultiLoad knows the proper point from which to RESTART. There is one
LOGTABLE for each run. Since MultiLoad will not resubmit a command that has been run
previously, it will use the LOGTABLE to determine the last successfully completed step.
Work Table(s): MultiLoad will automatically create one worktable for each target table. This
means that in IMPORT mode you could have one or more worktables. In the DELETE mode, you
will only have one worktable since that mode only works on one target table. The purpose of
worktables is to hold two things:
1. The Data Manipulation Language (DML) tasks
2. The input data that is ready to APPLY to the AMPs
The worktables are created in a database using PERM space. They can become very large. If the
script uses multiple SQL statements for a single data record, the data is sent to the AMP once for
each SQL statement. This replication guarantees fast performance and that no SQL statement
will ever be done more than once. So, this is very important. However, there is no such thing as a
free lunch, the cost is space. Later, you will see that using a FILLER field can help reduce this
disk space by not sending unneeded data to an AMP. In other words, the efficiency of the
MultiLoad run is in your hands.
Let's take a look at each phase and see what it contributes to the overall load process of this
magnificent utility. Should you memorize every detail about each phase? Probably not. But it is
important to know the essence of each phase because sometimes a load fails. When it does, you
need to know in which phase it broke down since the method for fixing the error to RESTART may
vary depending on the phase. And if you can picture what MultiLoad actually does in each phase,
you will likely write better scripts that run more efficiently.
Second, all MultiLoad sessions with Teradata need to be established. The default is the number
of available AMPs. Teradata will quickly establish this number as a factor of 16 for the basis
regarding the number of sessions to create. The general rule of thumb for the number of sessions
to use for smaller systems is the following: use the number of AMPs plus two more. For larger
systems with hundreds of AMP processors, the SESSIONS option is available to lower the
default. Remember, these sessions are running on your poor little computer as well as on
Teradata.
Each session loads the data to Teradata across the network or channel. Every AMP plays an
essential role in the MultiLoad process. They receive the data blocks, hash each row and send
the rows to the correct AMP. When the rows come to an AMP, it stores them in worktable blocks
on disk. But, lest we get ahead of ourselves, suffice it to say that there is ample reason for
multiple sessions to be established.
What about the extra two sessions? Well, the first one is a control session to handle the SQL and
logging. The second is a back up or alternate for logging. You may have to use some trial and
error to find what works best on your system configuration. If you specify too few sessions it may
impair performance and increase the time it takes to complete load jobs. On the other hand, too
many sessions will reduce the resources available for other important database activities.
Third, the required support tables are created. They are the following:
The final task of the Preliminary Phase is to apply utility locks to the target tables. Initially, access
locks are placed on all target tables, allowing other users to read or write to the table for the time
being. However, this lock does prevent the opportunity for a user to request an exclusive lock.
Although, these locks will still allow the MultiLoad user to drop the table, no one else may DROP
or ALTER a target table while it is locked for loading. This leads us to Phase 2.
Later, during the Acquisition phase the actual input data will also be stored in the worktable so
that it may be applied in Phase 4, the Application Phase. Next, a match tag is assigned to each
DML request that will match it with the appropriate rows of input data. The match tags will not
actually be used until the data has already been acquired and is about to be applied to the
worktable. This is somewhat like a student who receives a letter from the university in the
summer that lists his courses, professor's names, and classroom locations for the upcoming
semester. The letter is a "match tag" for the student to his school schedule, although it will not be
used for several months. This matching tag for SQL and data is the reason that the data is
replicated for each SQL statement using the same data record.
At this point, Teradata does not care about which AMP receives the data block. The blocks are
simply sent, one after the other, to the next AMP in line. For their part, each AMP begins to deal
with the blocks that they have been dealt. It is like a game of cards-you take the cards that you
have received and then play the game. You want to keep some and give some away.
Similarly, the AMPs will keep some data rows from the blocks and give some away. The AMP
hashes each row on the primary index and sends it over the BYNET to the proper AMP where it
will ultimately be used. But the row does not get inserted into its target table, just yet. The
receiving AMP must first do some preparation before that happens. Don't you have to get ready
before company arrives at your house? The AMP puts all of the hashed rows it has received from
other AMPs into the worktables where it assembles them into the SQL. Why? Because once the
rows are reblocked, they can be sorted into the proper order for storage in the target table. Now
the utility places a load lock on each target table in preparation for the Application Phase. Of
course, there is no Acquisition Phase when you perform a MultiLoad DELETE task, since no data
is being acquired.
Phase 4: Application Phase
The purpose of this phase is to write, or APPLY, the specified changes to both the target tables
and NUSI subtables. Once the data is on the AMPs, it is married up to the SQL for execution. To
accomplish this substitution of data into SQL, when sending the data, the host has already
attached some sequence information and five (5) match tags to each data row. Those match tags
are used to join the data with the proper SQL statement based on the SQL statement within a
DMP label. In addition to associating each row with the correct DML statement, match tags also
guarantee that no row will be updated more than once, even when a RESTART occurs.
Remember, MultiLoad allows for the existence of NUSI processing during a load. Every hash-
sequence sorted block from Phase 3 and each block of the base table is read only once to reduce
I/O operations to gain speed. Then, all matching rows in the base block are inserted, updated or
deleted before the entire block is written back to disk, one time. This is why the match tags are so
important. Changes are made based upon corresponding data and DML (SQL) based on the
match tags. They guarantee that the correct operation is performed for the rows and blocks with
no duplicate operations, a block at a time. And each time a table block is written to disk
successfully, a record is inserted into the LOGTABLE. This permits MultiLoad to avoid starting
again from the very beginning if a RESTART is needed.
What happens when several tables are being updated simultaneously? In this case, all of the
updates are scripted as a multi-statement request. That means that Teradata views them as a
single transaction. If there is a failure at any point of the load process, MultiLoad will merely need
to be RESTARTed from the point where it failed. No rollback is required. Any errors will be written
to the proper error table.
The chart below lists the key commands, their type, and what they do.
Step One: Setting up a Logtable and Logging onto Teradata- MultiLoad requires you specify a
log table right at the outset with the .LOGTABLE command. We have called it CDW_Log. Once
you name the Logtable, it will be automatically created for you. The Logtable may be placed in
the same database as the target table, or it may be placed in another database. Immediately after
this you log onto Teradata using the .LOGON command. The order of these two commands is
interchangeable, but it is recommended to define the Logtable first and then to Log on,
second. If you reverse the order, Teradata will give a warning message. Notice that the
commands in MultiLoad require a dot in front of the command key word.
Step Two: Identifying the Target, Work and Error tables- In this step of the script you must tell
Teradata which tables to use. To do this, you use the .BEGIN IMPORT MLOAD command. Then
you will preface the names of these tables with the sub-commands TABLES, WORKTABLES
AND ERROR TABLES. All you must do is name the tables and specify what database they are in.
Work tables and error tables are created automatically for you. Keep in mind that you get to name
and locate these tables. If you do not do this, Teradata might supply some defaults of its own!
At the same time, these names are optional. If the WORKTABLES and ERRORTABLES had not
specifically been named, the script would still execute and build these tables. They would have
been built in the default database for the user. The name of the worktable would be
WT_EMPLOYEE_DEPT1 and the two error tables would be called ET_EMPLOYEE_DEPT1 and
UV_EMPLOYEE_DEPT1, respectively.
Sometimes, large Teradata systems have a work database with a lot of extra PERM space. One
customer calls this database CORP_WORK. This is where all of the logtables and worktables are
normally created. You can use a DATABASE command to point all table creations to it or qualify
the names of these tables individually.
Step Three: Defining the INPUT flat file record structure- MultiLoad is going to need to know
the structure the INPUT flat file. Use the .LAYOUT command to name the layout. Then list the
fields and their data types used in your SQL as a .FIELD. Did you notice that an asterisk is placed
between the column name and its data type? This means to automatically calculate the next byte
in the record. It is used to designate the starting location for this data based on the previous fields
length. If you are listing fields in order and need to skip a few bytes in the record, you can either
use the .FILLER (like above) to position to the cursor to the next field, or the "*" on the Dept_No
field could have been replaced with the number 132 ( CHAR(11)+CHAR(20)+CHAR(100)+1 ).
Then, the .FILLER is not needed. Also, if the input record fields are exactly the same as the table,
the .TABLE can be used to automatically define all the .FIELDS for you. The LAYOUT name will
be referenced later in the .IMPORT command. If the input file is created with INDICATORS, it is
specified in the LAYOUT.
Step Four: Defining the DML activities to occur- The .DML LABEL names and defines the SQL
that is to execute. It is like setting up executable code in a programming language, but using
SQL. In our example, MultiLoad is being told to INSERT a row into the SQL01.Employee_Dept
table. The VALUES come from the data in each FIELD because it is preceded by a colon (:). Are
you allowed to use multiple labels in a script? Sure! But remember this: Every label must be
referenced in an APPLY clause of the .IMPORT clause.
Step Five: Naming the INPUT file and its format type- This step is vital! Using the .IMPORT
command, we have identified the INFILE data as being contained in a file called
"CDW_Join_Export.txt". Then we list the FORMAT type as TEXT. Next, we referenced the
LAYOUT named FILEIN to describe the fields in the record. Finally, we told MultiLoad to APPLY
the DML LABEL called INSERTS - that is, to INSERT the data rows into the target table. This is
still a sub-component of the .IMPORT MLOAD command. If the script is to run on a mainframe,
the INFILE name is actually the name of a JCL Data Definition (DD) statement that contains the
real name of the file.
Notice that the .IMPORT goes on for 4 lines of information. This is possible because it continues
until it finds the semi-colon to define the end of the command. This is how it determines one
operation from another. Therefore, it is very important or it would have attempted to process the
END LOADING as part of the IMPORT-it wouldn't work.
Step Six: Finishing loading and logging off of Teradata- This is the closing ceremonies for the
load. MultiLoad to wrap things up, closes the curtains, and logs off of the Teradata system.
Important note: Since the script above in does not DROP any tables, it is completely capable of
being restarted if an error occurs. Compare this to the next script in Do you think it is restartable?
If you said no, part yourself on the back.
MultiLoad IMPORT Script
Let's take a look at MultiLoad IMPORT script that comes from real life. This sample script will look
much more like what you might encounter at your workplace. It is more detailed. The notes to the
right are brief and too the point. They will help you can grasp the essence of what is happening in
the script.
In IMPORT mode, you may specify as many as five distinct error-treatment options for
one.DML statement. For example, if there is more than one instance of a row, do you want
MultiLoad to IGNORE the duplicate row, or to MARK it (list it) in an error table? If you do not
specify IGNORE, then MultiLoad will MARK, or record all of the errors. Imagine you have a
standard INSERT load that you know will end up recording about 20,000 duplicate row errors.
Using the following syntax "IGNORE DUPLICATE INSERT ROWS;" will keep them out of the
error table. By ignoring those errors, you gain three benefits:
1. You do not need to see all the errors.
2. The error table is not filled up needlessly.
3. MultiLoad runs much faster since it is not conducting a duplicate row check.
The table that follows shows you, in more detail, how flexible your options are:
You do this by redefining a field's position in the .FIELD or .FILLER section of the LAYOUT. Unlike
the asterisk (*), which means that a field simply follows the previous one, redefining will cite a
number that tells MultiLoad to take a certain portion of the INPUT file and jump to the redefined
position to back toward the beginning of the record.
A Script that Uses Redefining the Input
The following script uses the ability to define two record types in the same input data file. It uses a
.FILLER to define the code since it is never used in the SQL, only to determine which SQL to run.
A DELETE MLOAD Script Using a Hard Coded Value
The next script demonstrates how to use the MultiLoad DELETE task. In this example, students
no longer enrolled in the university are being removed from the Student_Profile table, based upon
the registration date. The profile of any student who enrolled prior to this date will be removed.
How many differences from a MultiLoad IMPORT script readily jump off of the page at you? Here
are a few that we saw:
At the beginning, you must specify the word "DELETE" in the .BEGIN MLOAD command.
You need not specify it in the .END MLOAD command.
You will readily notice that this mode has no .DML LABEL command. Since it is focused
on just one absolute function, no APPLY clause is required so you see no .DML LABEL.
Notice that the DELETE with a WHERE clause is an SQL function, not a MultiLoad
command, so it has no dot prefix.
Since default names are available for worktables (WT_<target_tablename>) and error
tables (ET_<target_tablename> and UV_<target_tablename>), they need not be specifically
named, but be sure to define the Logtable.
Do not confuse the DELETE MLOAD task with the SQL delete task that may be part of a
MultiLoad IMPORT. The IMPORT delete is used to remove small volumes of data rows based
upon the Primary Index. On the other hand, the MultiLoad DELETE does global deletes on tables,
bypassing the Transient Journal. Because there is no Transient Journal, there are no rollbacks
when the job fails for any reason. Instead, it may be RESTARTed from a CHECKPOINT. Also, the
MultiLoad DELETE task is never based upon the Primary Index.
Because we are not importing any data rows, there is neither a need for worktables or an
Acquisition Phase. One DELETE statement is sent to all the AMPs with a match tag parcel. That
statement will be applied to every table row. If the condition is met, then the row is deleted. Using
the match tags, each target block is read once and the appropriate rows are deleted.
A DELETE MLOAD Script Using a Variable
This illustration demonstrates how passing the values of a data row rather than a hard coded
value may be used to help meet the conditions stated in the WHERE clause. When you are
passing values, you must add some additional commands that were not used in the DELETE
example with hard coded values. You will see .LAYOUT and .IMPORT INFILE in this script.
An UPSERT Sample Script
The following sample script is provided to demonstrate how do an UPSERT-that is, to update a
table and if a row from the data source table does not exist in the target table, then insert a new
row. In this instance we are loading the Student_Profile table with new data for the next semester.
The clause "DO INSERT FOR MISSING UPDATE ROWS" indicates an UPSERT. The DML
statements that follow this option must be in the order of a single UPDATE statement followed by
a single INSERT statement.
What Happens when MultiLoad Finishes
MultiLoad Statistics
For the most part, the Acquisition error table logs errors that occur during that processing phase.
The Application error table lists Unique Primary Index violations, field overflow errors on non-PI
columns, and constraint errors that occur in the APPLY phase. MultiLoad error tables not only list
the errors they encounter, they also have the capability to STORE those errors. Do you
remember the MARK and IGNORE parameters? This is where they come into play. MARK will
ensure that the error rows, along with some details about the errors are stored in the error table.
IGNORE does neither; it is as if the error never occurred.
RESTARTing MultiLoad
Who hasn't experienced a failure at some time when attempting a load? Don't take it personally!
Failures can and do occur on the host or Teradata (DBC) for many reasons. MultiLoad has the
impressive ability to RESTART from failures in either environment. In fact, it requires almost no
effort to continue or resubmit the load job. Here are the factors that determine how it works:
First, MultiLoad will check the Restart Logtable and automatically resume the load process from
the last successful CHECKPOINT before the failure occurred. Remember, the Logtable is
essential for restarts. MultiLoad uses neither the Transient Journal nor rollbacks during a failure.
That is why you must designate a Logtable at the beginning of your script. MultiLoad either
restarts by itself or waits for the user to resubmit the job. Then MultiLoad takes over right where it
left off.
Second, suppose Teradata experiences a reset while MultiLoad is running. In this case, the host
program will restart MultiLoad after Teradata is back up and running. You do not have to do a
thing!
Third, if a host mainframe or network client fails during a MultiLoad, or the job is aborted, you
may simply resubmit the script without changing a thing. MultiLoad will find out where it stopped
and start again from that very spot.
Fourth, if MultiLoad halts during the Application Phase it must be resubmitted and allowed to run
until complete.
Fifth, during the Acquisition Phase the CHECKPOINT (n) you stipulated in the .BEGIN MLOAD
clause will be enacted. The results are stored in the Logtable. During the Application Phase,
CHECKPOINTs are logged each time a data block is successfully written to its target table.
HINT: The default number for CHECKPOINT is 15 minutes, but if you specify the CHECKPOINT
as 60 or less, minutes are assumed. If you specify the checkpoint at 61 or above, the number of
records is assumed.
RELEASE MLOAD: When You DON'T Want to Restart MultiLoad
What if a failure occurs but you do not want to RESTART MultiLoad? Since MultiLoad has already
updated the table headers, it assumes that it still "owns" them. Therefore, it limits access to the
table(s). So what is a user to do? Well there is good news and bad news. The good news is that if
the job you may use the RELEASE MLOAD command to release the locks and rollback the job.
The bad news is that if you have been loading multiple millions of rows, the rollback may take a
lot of time. For this reason, most customers would rather just go ahead and RESTART.
Before V2R3: In the earlier days of Teradata it was NOT possible to use RELEASE MLOAD if one
of the following three conditions was true:
In IMPORT mode, once MultiLoad had reached the end of the Acquisition Phase you
could not use RELEASE MLOAD. This is sometimes referred to as the "point of no return."
In DELETE mode, the point of no return was when Teradata received the DELETE
statement.
If the job halted in the Apply Phase, you will have to RESTART the job.
With and since V2R3: The advent of V2R3 brought new possibilities with regard to using the
RELEASE MLOAD command. It can NOW be used in the APPLY Phase, if:
You are running a Teradata V2R3 or later version
You use the correct syntax:
RELEASE MLOAD <target-table> IN APPLY
The load script has NOT been modified in any way
The target tables either:
o Must be empty, or
o Must have no Fallback, no NUSIs, no Permanent Journals
You should be very cautious using the RELEASE command. It could potentially leave your table
half updated. Therefore, it is handy for a test environment, but please don't get too reliant on it for
production runs. They should be allowed to finish to guarantee data integrity.
MultiLoad and INMODs
INMODs, or Input Modules, may be called by MultiLoad in either mainframe or LAN
environments, providing the appropriate programming languages are used. INMODs are user
written routines whose purpose is to read data from one or more sources and then convey it to a
load utility, here MultiLoad, for loading into Teradata. They allow MultiLoad to focus solely on
loading data by doing data validation or data conversion before the data is ever touched by
MultiLoad. INMODs replace the normal MVS DDNAME or LAN file name with the following
statement:
.IMPORT INMOD=<INMOD-NAME>
You will find a more detailed discussion on how to write INMODs for MultiLoad in the chapter of
this book titled, "INMOD Processing".
How MultiLoad Compares with FastLoad