TD Notes
TD Notes
TD Notes
Explain explained :P
Before collecting stats explain plan:
SHOW:
HELP SQL SHOW gives list of objects that can be referenced by SHOW command.
Set or Mulitset
(Permanent) Journelling references
CHARACTER SET LATIN
NOT CASE SPECIFIC
Checksum
Built-in variables:
SEL SESSION,USER,DATABASE,ACCOUNT;
HELP SESSION gives more details about account, logon_date, logon_time etc
Tools Option Query (Go and View)
Allowing multiple queries
Tools Option Code Editor(Go and View)
Converting keywords to uppercase as you type.
Display number lines
Left Margin to change the background for line number. Use background to
change the background for SQL code
Tools Option Answers Tab(Go and View)
Tools Option - History
DML- Data Manipulation language It refers to SQL that manipulates data by either
inserting, Selecting,updating or deleting rows of the table(hence columns)
DCL- Data control language It is mainly used to control privileges (DBA)
GRANT, REVOKE, GIVE(Transfer database ownership)
Column names : ID
Name
Exposure_cnt
SELECT ID, Name Exposure_cnt (Find what is wrong in this query)
ID, Exposure_cnt
1, Sagar
2, Unknow
Omitting the comma from the query by mistake led to Name being renamed as
Exposure_cnt
Equal
Greater than or equal
to
Less than or equal to
Greater than
Less than
Not equal to
ANSI Standard
=
Teradata
Extension
EQ
>=
<=
>
<
<>
GE
LE
GT
LT
NE
DISTINCT:
Distinct automatically orders the result set. This is the way it removes duplicates
WITHOUT DISTINCT OPTION :
SELECT ID,NAME FROM EMP ORDER BY 1,2
1,SAGAR
1,SAGAR
2,JANU
1,SAGAR
2,JANU
NULL:
In Binary we have true or false. In relational DB we have true, false or unknown. Null
is an unknown and it is not comparable or equtable
Let us see one example.
LAST_NM
SADHANA
OBAMA
TENDULKAR
DHONI
KOHLI
DEPT_
NO
401
402
403
?
?
FIRST_NM
VIDYA
SAGAR
BARACK
SACHIN
?
VIRAT
LAST_NM
SADHANA
OBAMA
TENDULKAR
DHONI
KOHLI
DEPT_N
O
401
402
403
405
?
FIRST_NM IN
(VIDYA
SAGAR,SACHIN
)
TRUE
FALSE
TRUE
FALSE
FALSE
LAST_NM
SADHANA
OBAMA
TENDULKAR
DHONI
KOHLI
DEPT_
NO
401
402
403
405
?
First_nm not in
(BARACK)
TRUE
FALSE
TRUE
FALSE
TRUE
select
databasename,
tablename,
sum (currentperm)/1024**3 as current_GB
from dbc.allspace
where tablename <> 'All'
and databasename = 'yourDB'
group by 1,2
order by 1,2
Recursive Query:
Recursive query consist of two parts a seeding query and a recursive query. Seeding query provides the initial
records for which data needs to be iterated whereas Recursive query part helps in data iteration.
P Function with Example
Teradata Tech Associates
Nov
04
Recursive Query in Teradata
TD Tips
by Saurabh Gupta
Guest Post
EMP_ID INTEGER,
MGR_ID INTEGER,
EMP_NAME VARCHAR(100)
)
ON COMMIT PRESERVE ROWS;
Employee Id in column EMP_ID , Employees manager Id in column MGR_ID and Employee Name in column
EMP_NAME e.g. Zoya is an employee with Employee Id 28006 and her direct Manager Id is 28005 .
One day Zoya wants to find out her first level, second level managers in the hierarchy i.e. all the direct and indirect
manager above her in the hierarchy. Since employee table stores only first level of hierarchy (i.e. the direct manager
which is employee id 28005 ( Patrick ) in case of Zoya), to find the hierarchy she needs to query database multiple
times .This can be achieved very easily using a recursive query within a single sql statement.
gives the Manager name and manager employee Id for Employee ID 28006 ( for Zoya ). So the first level manager for
Zoya identified by above sql is Patrick. Data retrieved using above sql is internally stored in derived table MGR_JRS.
This is often referred as seeding query.
3) Second select joins Employee table to MGR_JRS table which is getting populated as part of recursive logic. This is
often referred as recursive query.
SELECT MGR_JRS.EMP_ID,
EMPL.MGR_ID,
empl1.EMP_NAME,
MGR_JRS.DEPTH + 1
FROM MGR_JRS INNER JOIN Employee empl
ON MGR_JRS.MGR_ID = EMPL.EMP_ID
inner join Employee empl1
on empl.mgr_id = empl1.emp_id
Since MGR_JRS has one record with MGR_ID 28005 , it joins with employee table where EMP_ID is 28003 and in
turn saves one more row in MGR_JRS.
In this case Zoya has four levels of manager above her in the organization chart. Thus, a complex requirement of
multiple iteration is very easily solved using recursive query.
Limitations :
Following are not allowed in recursive portion of query but allowed in seeding portion :
DISTINCT
TOP N function ,Cursors ,Stored procedures,Triggers are not permitted in recursive query
Summary :
The recursive query feature enables multiple iterations against hierarchical data in a single SQL statement.
ng Stats on Table Level or Column Level ?
Practice Questions 2
Nov
20
Primary Index in Teradata
TD Basics
by admin
Each table in Teradata is required to have a primary index.Even if you did not define any primary index in CREATE
table statement, the Teradata system will automatically create the primary index based on the setting of DBScontrol
setting field . The primary index defines where data will reside and which AMP receives the row.
The three most important roles the primary index does is the following
Data Distribution
In short primary index provides the fastest physical path to retrieving data.
Two types of Primary Indexes are given below
A unique primary index (UPI) will always spread the rows of the table evenly amongst the AMPs.
A Non Unique Primary Index (NUPI) will almost never spread the rows of the table evenly amongst the AMPs.
Key Point Dont assume that UPI is always the BEST CHOICE; sometimes the use of NUPI is very critical. If you
need to perform lot of query with LAST_NAME in the WHERE clause then NUPI on last name is essential.
Primary Index and the ROW HASH
The primary index is the key to determine where the ROW of the table will reside on which AMP. When a new row
arrive for insert in Teradata the following steps occur
1.
Teradata Parsing Engine (PE) examines the primary index of the row.
2.
Teradata takes the primary index of the rows and run it through HASHING ALGORITHM.
3.
The output of the Hashing Algorithm is the 32 bit Row Hash value.
e.g
New row coming with following attributes
The Row hash value which we determine above, is used to point to certain bucket in the Hash Map. This value
points to only one bucket in the Hash Map and that bucket number will determine the AMP number where this new
row will reside.
e.g.
11110000111100001111000011110000 tells that it correspond to first row and fourth column of Hash Map i.e. AMP no
=4
So finally we have the AMP number (4) where the new row will reside.
After doing all this calculation in PE , the PE instruct BYNET to send the row along with its unique Row Hash Value
in the AMP number 4.
The new row along with its Row hash Value is send to AMP number 4 for storing. In this way we achieve the data
distribution for each row of the given table.
Uniqueness value
When the AMP receives the row it will place the row into the proper table, and the AMP itself check that is there any
other row with the same Row- Hash Value. If this is the first Row with the particular Row- Hash value then it will
assign a uniqueness value of 1 along with the row. But if this is the second row with the same Row-Hash value for
the same table then it will assign auniqueness value of 2. Similarly it assign a value of 3 if it finds third occurrence of
the same Row Hash, in this fashion it keeps on adding uniqueness value for the duplicates rows.
If you dont get this concept at the first go dont worry, it is explained again with the suitable examples for more clarity.
An example of UPI table, where EMP_ID is the UPI.
Note as the table has UPI defined on EMP_ID it will always has the unique row in the given table. So each time
AMP will assign the uniqueness value of 1 against each row.
An example of NUPI table, where LAST_NAME is the NUPI.
Note as the table has NUPI defined on LAST_NAME, we have three rows for the same LAST_NAME = Kumar.
Hence the AMP will assign different uniqueness value from 1 t 3 for each row.
Together with Row Hash and the uniqueness value the Teradata make as 64 bit ROW ID to uniquely identify each
row in the given AMP.
This is the complete reference for Primary Index in Teradata. I tried to make it as simple as possible for the newbies,
primary index
create table
data load
The purpose of the new Teradata 13.0 feature that allows you to create tables with no primary index is
improved performance of FastLoad and Teradata Parallel Data Pump Array INSERT data loading
operations.
Because there is no primary index for the rows of a NoPI table, its rows are not hashed to an AMP
based on their primary index value. Instead, Teradata Database either hashes on the Query ID for a
row, or it uses a different algorithm to assign the row to its home AMP once it reaches the AMP onto
which it has been FastLoaded. The Teradata Database then generates a RowID for each row in a NoPI
table by randomly selecting an arbitrary hash bucket that an AMP owns and using it to generate a
RowID.
The result is faster and more efficient data loading.
optimally for applications that load data into a staging table, which must first undergo a conversion to
some other form, and then be redistributed before they are stored in a secondary staging table.
Using a NoPI table as a staging table for such applications avoids the row redistribution and sorting
required for primary-indexed staging tables. Another advantage of NoPI tables is that you can quickly
load data into them and be finished with the acquisition phase of the utility operation, which frees
client resources for other applications.
NoPI tables are also useful as so-called sandbox tables when an appropriate primary index has not yet
been defined for the primary-indexed table they will eventually populate. This use of a NoPI table
enables you to experiment with several different primary index possibilities before deciding on the
most optimal choice for your particular application workloads.
CHECKSUM = DEFAULT
(
ItemNbr INTEGER NOT NULL,
SaleDate DATE FORMAT 'MM/DD/YYYY' NOT NULL,
ItemCount INTEGER);
When neither PRIMARY INDEX nor NO PRIMARY INDEX is specified in the CREATE TABLE AS
statement as shown in the example below, table Sales_Temp4 will be created based on the setting of
PrimaryIndexDefault. If PrimaryIndexDefault is set to N, Sales_Temp4 will be created as a NoPI table.
In this example, table Sales acting as the source table can be a PI table or a NoPI table.
CREATE MULTISET TABLE Sales_Temp4 AS (SELECT * FROM Sales) WITH DATA;
The new DBSControl General Field, PrimaryIndexDefault, was introduced to control whether a table is
created with or without a primary index when none of PRIMARY INDEX, NO PRIMARY INDEX, PRIMARY
KEY and UNIQUE Constraint is specified. The setting of PrimaryIndexDefault is as follows:
D - This is the default setting. This setting works the same as the P setting.
P - The first column in the table will be selected as the non-unique primary index. This setting works
the same as that in the past when PRIMARY INDEX was not specified.
N The table will be created without a primary index (NoPI table).
With the NoPI Table feature, the system default setting essentially remains the same as that in
previous Teradata releases where the first column was selected as the non-unique primary index when
the user did not specify a PRIMARY INDEX or a PRIMARY KEY or a UNIQUE Constraint. Users can
change the default setting for PrimaryIndexDefault to P or N and not rely on the system default
setting which can possibly be changed in a future release.
To change from a NoPI table to PI table, users can run one of the following:
CREATE TABLE AS - The example shown below will create table Sales_Temp_PI with primary index
ItemNbr and all of the data from NoPI table Sales_Temp1.
CREATE MULTISET TABLE Sales_Temp_PI AS (SELECT * FROM Sales_Temp1) WITH DATA PRIMARY
INDEX (ItemNbr);
CREATE TABLE and INSERT-SELECT - The example shown below will create table Sales_Temp_PI
with primary index ItemNbr and then copy over all of the data from NoPI table Sales_Temp1.
CREATE MULTISET TABLE Sales_Temp_PI, FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
ItemNbr INTEGER NOT NULL,
SaleDate DATE FORMAT 'MM/DD/YYYY' NOT NULL,
ItemCount INTEGER)
PRIMARY INDEX (ItemNbr);
INSERT INTO Sales_Temp_PI SELECT * FROM Sales_Temp1;
Users can also change from a PI table to a NoPI table using one of the above approaches.
Fallback
Secondary indexes
Join indexes
UNIQUE column constraints
CHECK constraints
PRIMARY KEY and FOREIGN KEY constraints
Triggers
BLOB and CLOB columns.
You can define any of the following table types as NoPI tables:
Ordinary base data tables
Global temporary tables
Volatile tables
You cannot define any of the following table types as NoPI tables:
Queue tables
Error tables
Global temporary trace tables6
You can use the following DML statements to manipulate or retrieve NoPI table rows prior to
moving them to their primary-indexed target table:
DELETE
INSERT
SELECT
You cannot use UPDATE or UPDATE (Upsert Form) requests to change data in a NoPI table.
The following is a list of other operations and utilities and how they operate on a NoPI table:
Feature Impact
UPSERT
UPDATE
UPDATE
index in
- Not supported. This is a composite of UPDATE and INSERT operations such that if the
fails because the target row does not exist, the INSERT is automatically executed. The
part of an UPSERT is required to fully specify the primary index. Since there is no primary
a NoPI table, UPSERT is not supported.
MERGE-INTO - Not supported when the target table is NoPI. This is currently composed of three
variant forms UPSERT, UPDATE-only and INSERT-only. UPSERT is not supported for the same reason
discussed above. UPDATE is restricted in the initial release and therefore the UPDATE-only form of
MERGE-INTO is not supported. INSERT-only is also not supported but it is very much like INSERTSELECT which is supported.
MultiLoad - Not supported. FastLoad and TPump are utilities that are normally used to stage a table.
Restore/Copy Supported. The Restore/Copy processing on a NoPI table is very similar to that on a PI
table. The main difference is that a NoPI table normally has one hash bucket per AMP. This is like a
very non-unique NUPI table. So when the data from a NoPI table is restored or copied to a different
configuration with more AMPs, there may be some AMPs that will not have any data at all. So,
Restore/Copy can cause data skewing for a NoPI table.
FastExport - Supported. Data in a NoPI table can be FastExported the same way as for a PI table.
CheckTable - Supported. The LEVEL HASH check which is done with either an explicit LEVEL HASH
command or implicitly in a LEVEL THREE command works differently on a PI table and a NoPI table.
On a PI table, this check regenerates the rowhash for each data row based on the primary index and
then compares with the rows on disk. On a NoPI table, this check looks at the rowhash for each data
row and verifies that the hash bucket that is part of the rowhash correctly belongs to the AMP.
Table Rebuild - Supported. Table Rebuild processing on NoPI table is the same as that on a PI table.
The table has to have fallback protection for Table Rebuild to run. Rows in a NoPI table have rowhash
and therefore can be rebuilt the same way as those on a PI table.
Reconfig - Supported. The Reconfig processing on a NoPI table is very similar to that on a PI table.
The main difference is that a NoPI table normally has one hash bucket per AMP. This is like a very
skewed non-unique NUPI table. So when a system is reconfigured to have more AMPs, there may be
some AMPs that will not have any data at all for a NoPI table. So, Reconfig can cause data skewing for
NoPI table.
Average: 4.7/5]
11
15
Shares
Starting with Teradata Release 13, tables can be defined without having a Primary Index.
Nevertheless, the primary index still is the main idea to achieve an even data distribution on a
Teradata system. By design, the primary index ensures linear scalability.
Hence, the question is: What is the meaning of tables without a primary index, how are they
implemented, and how are they fiting into the hashing design of Teradata?
Initally, some words regarding how data is distributed in case of a No Primary Index (NOPI)
table.
Basically, rows are distributed randomly across the AMPs.
As no hashing takes place, but rows have to be uniquely identifiable, the ROWID is generated
differently than the ROWID of a regular primary index table:
Teradata uses the HASHBUCKET of the responsible AMP and adds a uniqueness value. The
bytes normally occupied by the hash value are being used to increase the range for generating
uniqueness values.
This is how No Primary Index Tables are created:
CREATE TABLE <TABLE> (
PK INTEGER NOT NULL
) NO PRIMARY INDEX ;
Be aware, if neither PRIMARY INDEX nor NO PRIMARY INDEX is defined, depending from
the DBS control field PrimaryIndexDefault, the result may be a table without primary index (NO
PRIMARY INDEX, PrimaryIndexDefault=P,D) or the first not nullable column of the table
definition will be used as the PRIMARY INDEX (in case all columns are nullable, the first
nullable column of the CREATE TABLE statement will be used as the PRIMARY INDEX
(PrimaryIndexDefault=N).
As No Primary Index tables are distributed randomly across the AMPs, bulk loading of NOPI
tables (Fastloads, TPUMP array insert loads) is faster than loading tables with a Primary Index.
Let me describe the positive performance impact on the technical phases of a FastLoad:
1. Incoming rows are distributed in a round-robin fashion randomly across all AMPs
2. The rows are hashed by the primary index value and forwarded to the responsible AMPs
3. The responsible AMPs sort the received rows by ROWID
Let us consider a No Primary Index table: After distributing the rows randomly across the AMPs
we are already finished. No hashing and redistribution is needed. No sorting is needed.
Furthermore, as the rows are assigned randomly to the AMPs, our data will always be distributed
evenly across all AMPs and no skewing will occur.
This speeds up the load process, as only the acquisition phase of the loading utilities is executed.
Another advantage of NOPI tables is, that the rows are always appended to the end of the tables
data blocks, thus any overhead usually caused by sorting the rows by rowhash into the data
blocks is avoided. For example, in case you INSERTSELECT huge amounts of rows into
your NOPI table, above described effect will reduce the number of IOs significantly.
No Primary Index Tables and Skewing
Bulk loaded small tables will always be skewed. Round robin is done on block
level and not the row level. Some AMPs will receive data blocks and the rest
will receive no data.
NOPI tables are useful in certain situations, but without a Primary Index, row access is limited to
All-AMP full tables scans. In order to offset this impact, you could add an unique secondary
index (USI) to your NOPI table.
There are common features used by NOPI tables and the Teradata Columnar implementation,
introduced with Teradata 14: Column partitioned tables are always NOPI tables, and although
offering great performance improvements for certain workload types, they lack as well the fast
primary index access.
To some extend, the mentioned disadvantage of no primary index tables can be compensated
with join indexes or secondary indexes.
No Primary Index tables should not be used as production tables, mainly consider using them
during the ETL-Process, in situations when anyway a full table scan is required.
NOPI tables come with some further restictions, find below the most important ones:
Only MULTISET tables can be created
No identity columns can be used
NoPi tables cannot be partitioned with a PPI
No statements with an update character allowed (UPDATE,MERGE INTO,UPSERT), still you
can use INSERT,DELETE and SELECT
No Permanent Journal possible
Cannot be defined as Queue Tables
No Queue Tables allowed
Update Triggers cannot update a NOPI table (probably introduces with a later release)
No hash indexes are allowed (use join indexes instead)
The following features can be used as usual together with NOPI tables:
Fallback protection of the table
Secondary Indexes (USI, NUSI)
Join Indexes
CHECK and UNIQUE constraints
Triggers
Collection of statistics
The information about NOPI tables, as in the case of any table type, is stored in DBC.Tables.
The tablekind is O:
06
Teradata Utilities FASTLOAD part 1
TD Utilities
by admin
Introduction
Fastload is the Teradata utility which can be used to load large amount of data in empty table on a Teradata System.
The property which makes Fastload UNIQUE is its fast speed to load huge amount of data in tables.
Any other device which can provide the properly formatted source data
Fastload uses multiple sessions and block level operation to load data into table that is why its much faster than any
other loading utilities.
TIP However, one Fastload job loads data into one table only. If you want to load data into more than one
table , you must submit multiple Fastload jobsone for each table.
Three requirements for Fastload to execute
LOG TABLE This table is used to record all the progress during execution of Fastload. This table exists in
the SYSADMIN database under the name of FASTLOG. The structure of the FASTLOG table is as follows
EMPTY TABLE Fastload needs the table to be empty before inserting rows into that table. It does not care how this
is implemented but it requires empty tables without any prior records. If the table is not empty then Fastload will
pause and show the error.
TWO ERROR TABLES Fastload require two error tables to catch any error which occurs during the execution of
the Fastload. It create the error tables itself. Each error tables record error of specific type.
The first error table record any translation or constraint violation error. For e.g. if a row has data which does not
match with the data type of its particular column then this row will be captured by error table 1. Insertion of CHAR
data in INTEGER data type column.
The second error table captured any error which is related to duplication of values for Unique Primary Indexes
(UPI). Fastload will capture only one occurrence of the value and store the duplicate occurrence in the second error
table. However if the entire row is duplicated then Fastload count it but does not store the row.
These tables are analyzed later for error handling.
Data Conversion Capabilities
Fastload also allow the data conversion on the column value before inserting it into target table. Following are the
conversion allowed by Fastload.
Character-to-numeric
Character-to-date
Date-to-character
LOGON begins a session. LOGOFF ends a session. QUIT is the same as LOGOFF.
NOTIFY
NOTIFY command used to inform the job that follows that some event has occurred. It is
often used for detailed reporting on the Fastload jobs success.
RECORD
Specifies the beginning record number (or with THRU, the ending record number) of the
Input data source, to be read by Fastload. Syntactically, This command is placed before the
INSERT keyword.
SET RECORD
Used only in the LAN environment, this command states in what format the data from the
Input file is coming: Fastload, Unformatted, Binary, Text, or Variable Text. The default is the
Teradata RDBMS standard, Fastload.
SESSIONS
This command specifies the number of FastLoad sessions to establish with Teradata. It is
written in the script just before the logon. The default is 1 session per available AMP.
TENACITY
Suppose Fastload is not able to obtain all the session required by it. In that case only two
options are left. Either terminates the Fastload or re try to obtain all the sessions.TENACITY
specifies the amount of time, in hours; Fastload will re try to obtain the sessions. The default
for Fastload is no tenacity, meaning that it will not retry at all.
SLEEP
Working in conjunction with TENACITY, the SLEEP command specifies the amount of time in
minutes to wait before retrying to obtain all sessions.For example, suppose if TENACITY
were set at 5 and SLEEP at 10, then Fastload would attempt to logon every 10 minutes for
up to 5 hours. If there were no success by that time, all efforts to logon would cease.