TD Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 24

User vs Database:

Database doesnt have a password, but user has.


CREATE DATABASE TDUSER AS PERM=100000;
CREATE USER SAGAR AS PERM=1000000,PASSWORD=Trustme;
HELP command database or user interchangeable
HELP DATABASE SAGAR
HELP USER SAGAR;
Even after table creation, newly created table doesnt reflect under database in
explored tree, it will not reflect in explorer tree until session is disconnected and
connected again.

Explain explained :P
Before collecting stats explain plan:

Explain Select ID, NAME, REFERRED_BY


From SAMPLES.CUSTOMERS

1) First, we lock a distinct SAMPLES."pseudo table" for read on a


RowHash to prevent global deadlock for SAMPLES.CUSTOMERS.
2) Next, we lock SAMPLES.CUSTOMERS for read.
3) We do an all-AMPs RETRIEVE step from SAMPLES.CUSTOMERS by way of
an all-rows scan with no residual conditions into Spool 1
(group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with low confidence to be 10 rows (420 bytes).
The estimated time for this step is 0.03 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of

statement 1. The total estimated time is 0.03 seconds.


COLLECT STATS ON SAMPLES.CUSTOMERS COLUMN(ID)
Observation: Collected stats on Name column first, but still showing 10 estimated
rows
Collected stats only on ID(Primary index) this time. Estimated rows are 5 which is
same in the table :D
Explain Select ID, NAME, REFERRED_BY
From SAMPLES.CUSTOMERS
After collecting stats explain plan:
1) First, we lock a distinct SAMPLES."pseudo table" for read on a
RowHash to prevent global deadlock for SAMPLES.CUSTOMERS.
2) Next, we lock SAMPLES.CUSTOMERS for read.
3) We do an all-AMPs RETRIEVE step from SAMPLES.CUSTOMERS by way of
an all-rows scan with no residual conditions into Spool 1
(group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with high confidence to be 5 rows (210 bytes).
The estimated time for this step is 0.03 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.03 seconds.

HELP SQL HELP gives list of HELP objects.


Default Database:
Usually default database will be their username. This can be overwritten as below
Usual : CREATE USER SAGAR AS PERM=1000000,PASSWORD=Trustme;
CREATE USER SAGAR AS PERM=1000000,PASSWORD=Trustme,DEFAULT
DATABASE=TDUSER;

SHOW:
HELP SQL SHOW gives list of objects that can be referenced by SHOW command.

SHOW command gives below information

Set or Mulitset
(Permanent) Journelling references
CHARACTER SET LATIN
NOT CASE SPECIFIC
Checksum

Built-in variables:
SEL SESSION,USER,DATABASE,ACCOUNT;

HELP SESSION gives more details about account, logon_date, logon_time etc
Tools Option Query (Go and View)
Allowing multiple queries
Tools Option Code Editor(Go and View)
Converting keywords to uppercase as you type.
Display number lines
Left Margin to change the background for line number. Use background to
change the background for SQL code
Tools Option Answers Tab(Go and View)
Tools Option - History

Record queries and metadata in the history database.


Show tooltip when mouse hover over SQL column

We can rearrange column in history window by dragging and moving.


Filter Option in History: Useful
Right click on any of the SQL query in history window.
Summary:

SQL Assistant is an ODBC-based tool used to access any ODBC complaint


database.
SQL Assistant provides GUI access to the Teradata Dictionary
SQL command like HELP and SHOW can also be used to provide dictionary
information.
History can be used to fetch previously executed request to the query
window

One must always be mindful about the default database setting.

SQL: Structured Query Language

A complete data access to Maintenance language


Non-procedural language
Designed for RDBMS.
An industry standard for relational databases.

Three SQL classifications


DDL- Data definition language Create, Alter or Drop It refers to SQL that affect
objects. Such SQL causes write lock to the TD dictionary,While changes are being
made it applies write lock. So usually these are applied during off hours

DML- Data Manipulation language It refers to SQL that manipulates data by either
inserting, Selecting,updating or deleting rows of the table(hence columns)
DCL- Data control language It is mainly used to control privileges (DBA)
GRANT, REVOKE, GIVE(Transfer database ownership)
Column names : ID
Name
Exposure_cnt
SELECT ID, Name Exposure_cnt (Find what is wrong in this query)
ID, Exposure_cnt
1, Sagar
2, Unknow
Omitting the comma from the query by mistake led to Name being renamed as
Exposure_cnt

Equal
Greater than or equal
to
Less than or equal to
Greater than
Less than
Not equal to

ANSI Standard
=

Teradata
Extension
EQ

>=
<=
>
<
<>

GE
LE
GT
LT
NE

DISTINCT:
Distinct automatically orders the result set. This is the way it removes duplicates
WITHOUT DISTINCT OPTION :
SELECT ID,NAME FROM EMP ORDER BY 1,2

1,SAGAR
1,SAGAR
2,JANU

WITH DISTINCT OPTION:


SELECT ID,NAME FROM EMP;

1,SAGAR
2,JANU

It had performed order by 1,2 automatically and then grouped them.


Bulit-in functions:
SESSION,USER,ACCOUNT,DATABASE
DATE,TIME,CURRENT_DATE,CURRENT_TIME(0),CURRENT_TIMESTAMP(6)
The DATE and TIME are keywords are Teradata extensions to the ANSI standard.
The values are considered are numerice as they are right justified.

NULL:
In Binary we have true or false. In relational DB we have true, false or unknown. Null
is an unknown and it is not comparable or equtable
Let us see one example.

SELECT LAST_NM,FRST_NM FROM DEPT


WHERE DEPT_NO<>401
FIRST_NM
VIDYA
SAGAR
BARACK
SACHIN
MS
VIRAT

LAST_NM
SADHANA
OBAMA
TENDULKAR
DHONI
KOHLI

DEPT_
NO
401
402
403
?
?

not equal to 401


FALSE
TRUE
TRUE
FALSE
FALSE

IN OPERATOR AND NULL:


SELECT LAST_NM,FIRST_NM FROM EMPLOYEE WHERE FIRST_NM IN (VIDYA
SAGAR,SACHIN)

FIRST_NM
VIDYA
SAGAR
BARACK
SACHIN
?
VIRAT

LAST_NM
SADHANA
OBAMA
TENDULKAR
DHONI
KOHLI

DEPT_N
O
401
402
403
405
?

FIRST_NM IN
(VIDYA
SAGAR,SACHIN
)

TRUE
FALSE
TRUE
FALSE
FALSE

SELECT LAST_NM,FIRST_NM FROM EMPLOYEE WHERE First_nm not in (BARACK)


FIRST_NM
VIDYA
SAGAR
BARACK
SACHIN
?
VIRAT

LAST_NM
SADHANA
OBAMA
TENDULKAR
DHONI
KOHLI

DEPT_
NO
401
402
403
405
?

First_nm not in
(BARACK)
TRUE
FALSE
TRUE
FALSE
TRUE

calculate the size of a table in teradata


My question is as calculating the weight that can occupy a table in the database knowing that will add
about 120 million records in a table, the insert will be done through the Multiload. What I want is to
know how much space you use database before inserting the information.
1
2
3
4
5
6
7
8
9

select
databasename,
tablename,
sum (currentperm)/1024**3 as current_GB
from dbc.allspace
where tablename <> 'All'
and databasename = 'yourDB'
group by 1,2
order by 1,2

Recursive Query:
Recursive query consist of two parts a seeding query and a recursive query. Seeding query provides the initial
records for which data needs to be iterated whereas Recursive query part helps in data iteration.
P Function with Example
Teradata Tech Associates

Nov

04
Recursive Query in Teradata

TD Tips

by Saurabh Gupta

Guest Post

Author Saurabh Gupta

In this post we will go through recursive query with one example.


The recursive query features enables to query hierarchical data such as an organisation chart or bill of materials
( check the raw materials used to make a product ). Using Recursive query data with in the single table can be
scanned multiple times using a single SQL statement.
A recursive query is useful when relationships between data elements exist in multiple iterations. Recursive query
consist of two parts a seeding query and a recursive query. Seeding query provides the initial records for which
data needs to be iterated whereas Recursive query part helps in data iteration.
We will be using below syntax for recursive query :
Recursive query syntax :

WITH RECURSIVE < recursive_tablename >


(<columnname 1>,<columnname 2>,.<columnname n>)
AS
(
<SELECT * from seed_table>
UNION ALL
<select * from recursive_tablename join seed_table>
)
<SELECT * FROM recursive_tablename>;

1) Recursive_tablename is a derived recursive table defined using WITH RECURSIVE keyword.


2) Column names of recursive table is specified as <columname >. This column names will be finally displayed in the
output.
3) Seeding query ( query which is used as the input for recursive logic ) is mentioned as the first select statement.
4) Second select statement is known as the recursive query and is joined with seed table for multiple iterations.
5) Third select statement finally gives the output of recursive logic.
Lets consider a table, Employee ( ddl shown below ) which stores

CREATE VOLATILE TABLE Employee


(

EMP_ID INTEGER,
MGR_ID INTEGER,
EMP_NAME VARCHAR(100)
)
ON COMMIT PRESERVE ROWS;

Employee Id in column EMP_ID , Employees manager Id in column MGR_ID and Employee Name in column
EMP_NAME e.g. Zoya is an employee with Employee Id 28006 and her direct Manager Id is 28005 .
One day Zoya wants to find out her first level, second level managers in the hierarchy i.e. all the direct and indirect
manager above her in the hierarchy. Since employee table stores only first level of hierarchy (i.e. the direct manager
which is employee id 28005 ( Patrick ) in case of Zoya), to find the hierarchy she needs to query database multiple
times .This can be achieved very easily using a recursive query within a single sql statement.

WITH RECURSIVE MGR_JRS


(EMP_ID,
MGR_ID,
MGR_NAME,
DEPTH) AS
(
SELECT EMPL.EMP_ID, EMPL.MGR_ID, EMPL1.EMP_NAME AS MGR_NAME, 1 AS DEPTH
FROM Employee empl
inner join Employee empl1
on empl.mgr_id = empl1.emp_id
WHERE empl.EMP_ID=28006
UNION ALL
SELECT MGR_JRS.EMP_ID,
EMPL.MGR_ID,
empl1.EMP_NAME,
MGR_JRS.DEPTH + 1
FROM MGR_JRS INNER JOIN Employee empl
ON MGR_JRS.MGR_ID = EMPL.EMP_ID
inner join Employee empl1
on empl.mgr_id = empl1.emp_id
)
SELECT * FROM MGR_JRS;

Below points explains the above sql :


1) WITH RECURSIVE keyword specifies table MGR_JRS is a derived recursive table.
2) The first select statement :

SELECT EMPL.EMP_ID, EMPL.MGR_ID, EMPL1.EMP_NAME AS MGR_NAME, 1AS DEPTH


FROM Employee empl
inner join Employee empl1
on empl.mgr_id = empl1.emp_id
WHERE empl.EMP_ID=28006

gives the Manager name and manager employee Id for Employee ID 28006 ( for Zoya ). So the first level manager for
Zoya identified by above sql is Patrick. Data retrieved using above sql is internally stored in derived table MGR_JRS.
This is often referred as seeding query.
3) Second select joins Employee table to MGR_JRS table which is getting populated as part of recursive logic. This is
often referred as recursive query.

SELECT MGR_JRS.EMP_ID,
EMPL.MGR_ID,
empl1.EMP_NAME,
MGR_JRS.DEPTH + 1
FROM MGR_JRS INNER JOIN Employee empl
ON MGR_JRS.MGR_ID = EMPL.EMP_ID
inner join Employee empl1
on empl.mgr_id = empl1.emp_id
Since MGR_JRS has one record with MGR_ID 28005 , it joins with employee table where EMP_ID is 28003 and in
turn saves one more row in MGR_JRS.

Now Zoya knows her second level manager is Lorena.


4) Each time a new row is added to MGR_JRS, the possibility exists for a new match and thus a higher level
manager will be found in hierarchy.
5) It will recursively execute this join operation until it has found all managers for Zoya.
6) When the WITH table population is completed i.e. Step 5 is completed, the final SELECT will execute producing
the below output.

In this case Zoya has four levels of manager above her in the organization chart. Thus, a complex requirement of
multiple iteration is very easily solved using recursive query.
Limitations :
Following are not allowed in recursive portion of query but allowed in seeding portion :

Subqueries and derived tables

Aggregation and OLAP functions

NOT IN and NOT EXISTS

GROUP BY and extended grouping functions

DISTINCT

TOP N function ,Cursors ,Stored procedures,Triggers are not permitted in recursive query
Summary :
The recursive query feature enables multiple iterations against hierarchical data in a single SQL statement.
ng Stats on Table Level or Column Level ?
Practice Questions 2
Nov

20
Primary Index in Teradata

TD Basics

by admin

Each table in Teradata is required to have a primary index.Even if you did not define any primary index in CREATE
table statement, the Teradata system will automatically create the primary index based on the setting of DBScontrol
setting field . The primary index defines where data will reside and which AMP receives the row.
The three most important roles the primary index does is the following

Data Distribution

Fastest way to retrieve Data

Incredibly important for Joins

In short primary index provides the fastest physical path to retrieving data.
Two types of Primary Indexes are given below

Unique Primary Index (UPI)


A unique primary index means that the value for the selected column must be unique. In the example below
the EMP_ID is the Unique Primary Index.

A unique primary index (UPI) will always spread the rows of the table evenly amongst the AMPs.

Non Unique Primary Index (NUPI)


A Non unique primary index means that the value for the selected column can be non unique. In the example below
the LAST_NAME is the Non Unique Primary Index.

A Non Unique Primary Index (NUPI) will almost never spread the rows of the table evenly amongst the AMPs.
Key Point Dont assume that UPI is always the BEST CHOICE; sometimes the use of NUPI is very critical. If you
need to perform lot of query with LAST_NAME in the WHERE clause then NUPI on last name is essential.
Primary Index and the ROW HASH

The primary index is the key to determine where the ROW of the table will reside on which AMP. When a new row
arrive for insert in Teradata the following steps occur
1.

Teradata Parsing Engine (PE) examines the primary index of the row.

2.

Teradata takes the primary index of the rows and run it through HASHING ALGORITHM.

3.

The output of the Hashing Algorithm is the 32 bit Row Hash value.

e.g
New row coming with following attributes

Hash the PI value -> 55 / HASHING ALGORITHM = 11110000111100001111000011110000(Row Hash)


This value in Binary format is called as the Row Hash Value. And it will always be unique for the given PI. Means
its not possible to get two Row Hash Value for the same EMP_ID (55).
This 32 bit Row Hash value determines in which AMP the row will reside and it always attached along with the
ROW to make it a UNIQUE identification for that ROW.
Now in Teradata we have HASH MAP which contains the different bucket called as Hash Map Buckets. These
buckets contain only the different AMP number which is attached with the Teradata system. Suppose that Teradata
system contains 4 AMP then this Hash Map contains numbers from 1 to 4 in different buckets. see image below

Four Amp Hash Map

The Row hash value which we determine above, is used to point to certain bucket in the Hash Map. This value
points to only one bucket in the Hash Map and that bucket number will determine the AMP number where this new
row will reside.
e.g.
11110000111100001111000011110000 tells that it correspond to first row and fourth column of Hash Map i.e. AMP no
=4
So finally we have the AMP number (4) where the new row will reside.
After doing all this calculation in PE , the PE instruct BYNET to send the row along with its unique Row Hash Value
in the AMP number 4.

The new row along with its Row hash Value is send to AMP number 4 for storing. In this way we achieve the data
distribution for each row of the given table.
Uniqueness value

When the AMP receives the row it will place the row into the proper table, and the AMP itself check that is there any
other row with the same Row- Hash Value. If this is the first Row with the particular Row- Hash value then it will
assign a uniqueness value of 1 along with the row. But if this is the second row with the same Row-Hash value for
the same table then it will assign auniqueness value of 2. Similarly it assign a value of 3 if it finds third occurrence of
the same Row Hash, in this fashion it keeps on adding uniqueness value for the duplicates rows.
If you dont get this concept at the first go dont worry, it is explained again with the suitable examples for more clarity.
An example of UPI table, where EMP_ID is the UPI.

Note as the table has UPI defined on EMP_ID it will always has the unique row in the given table. So each time
AMP will assign the uniqueness value of 1 against each row.
An example of NUPI table, where LAST_NAME is the NUPI.

Note as the table has NUPI defined on LAST_NAME, we have three rows for the same LAST_NAME = Kumar.
Hence the AMP will assign different uniqueness value from 1 t 3 for each row.
Together with Row Hash and the uniqueness value the Teradata make as 64 bit ROW ID to uniquely identify each
row in the given AMP.

This is the complete reference for Primary Index in Teradata. I tried to make it as simple as possible for the newbies,

in case of any correction or clarification feel free to reach me


To read about Secondary index in teradata please go to Secondary Index in Teradata

Say Yes to No Primary Index (No PI)


Tables
Article by smazingo on 09 Apr 2009 13 commentsTags:

primary index

create table

data load

The purpose of the new Teradata 13.0 feature that allows you to create tables with no primary index is
improved performance of FastLoad and Teradata Parallel Data Pump Array INSERT data loading
operations.
Because there is no primary index for the rows of a NoPI table, its rows are not hashed to an AMP
based on their primary index value. Instead, Teradata Database either hashes on the Query ID for a
row, or it uses a different algorithm to assign the row to its home AMP once it reaches the AMP onto
which it has been FastLoaded. The Teradata Database then generates a RowID for each row in a NoPI
table by randomly selecting an arbitrary hash bucket that an AMP owns and using it to generate a
RowID.
The result is faster and more efficient data loading.

Uses for NoPI Tables


NoPI tables are particularly useful as staging tables for bulk data loads. When a table has no primary
index, its rows can be dispatched to any given AMP arbitrarily, so the system can load data into a
staging table faster and more efficiently using FastLoad or Teradata Parallel Data Pump Array INSERT
operations.
By storing bulk loaded rows on any arbitrary AMP, the performance impact for both CPU and I/O is
reduced significantly. After having been received by Teradata Database, all of the rows can be
appended to a NoPI table without being checked for duplicates or needing to be redistributed to their
hash-owning AMPs.
Because there is no requirement for such tables to maintain their rows in any particular order, the
system need not sort them. The performance advantage realized from NoPI tables is achieved

optimally for applications that load data into a staging table, which must first undergo a conversion to
some other form, and then be redistributed before they are stored in a secondary staging table.
Using a NoPI table as a staging table for such applications avoids the row redistribution and sorting
required for primary-indexed staging tables. Another advantage of NoPI tables is that you can quickly
load data into them and be finished with the acquisition phase of the utility operation, which frees
client resources for other applications.
NoPI tables are also useful as so-called sandbox tables when an appropriate primary index has not yet
been defined for the primary-indexed table they will eventually populate. This use of a NoPI table
enables you to experiment with several different primary index possibilities before deciding on the
most optimal choice for your particular application workloads.

Create Table with NoPI Syntax and Default Settings


The syntax for the CREATE TABLE statement has been changed to permit user data tables to be
created without a primary index. Such tables are referred to as NoPI (No Primary Index) Tables. A
NoPI table can be created in several different ways:
When the newly supported NO PRIMARY INDEX clause is explicitly specified in the CREATE TABLE
statement as shown in the example below, table Sales_Temp1 will be created as a NoPI table.
CREATE MULTISET TABLE Sales_Temp1, FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
ItemNbr INTEGER NOT NULL,
SaleDate DATE FORMAT 'MM/DD/YYYY' NOT NULL,
ItemCount INTEGER)
NO PRIMARY INDEX;
When the newly supported NO PRIMARY INDEX clause is explicitly specified in the CREATE TABLE AS
statement as shown in the example below, table Sales_Temp2 will be created as a NoPI table. In this
example, table Sales acting as the source table can be a PI table or a NoPI table.
CREATE MULTISET TABLE Sales_Temp2 AS (SELECT * FROM Sales)
WITH DATA NO PRIMARY INDEX;
When none of PRIMARY INDEX, NO PRIMARY INDEX, PRIMARY KEY and UNIQUE Constraint is
specified in the CREATE TABLE statement as shown in the example below, table Sales_Temp3 will be
created based on the setting of the new DBSControl General Field, PrimaryIndexDefault. If
PrimaryIndexDefault is set to N, Sales_Temp3 will be created as a NoPI table.
CREATE MULTISET TABLE Sales_Temp3, FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,

CHECKSUM = DEFAULT
(
ItemNbr INTEGER NOT NULL,
SaleDate DATE FORMAT 'MM/DD/YYYY' NOT NULL,
ItemCount INTEGER);

When neither PRIMARY INDEX nor NO PRIMARY INDEX is specified in the CREATE TABLE AS
statement as shown in the example below, table Sales_Temp4 will be created based on the setting of
PrimaryIndexDefault. If PrimaryIndexDefault is set to N, Sales_Temp4 will be created as a NoPI table.
In this example, table Sales acting as the source table can be a PI table or a NoPI table.
CREATE MULTISET TABLE Sales_Temp4 AS (SELECT * FROM Sales) WITH DATA;
The new DBSControl General Field, PrimaryIndexDefault, was introduced to control whether a table is
created with or without a primary index when none of PRIMARY INDEX, NO PRIMARY INDEX, PRIMARY
KEY and UNIQUE Constraint is specified. The setting of PrimaryIndexDefault is as follows:
D - This is the default setting. This setting works the same as the P setting.
P - The first column in the table will be selected as the non-unique primary index. This setting works
the same as that in the past when PRIMARY INDEX was not specified.
N The table will be created without a primary index (NoPI table).
With the NoPI Table feature, the system default setting essentially remains the same as that in
previous Teradata releases where the first column was selected as the non-unique primary index when
the user did not specify a PRIMARY INDEX or a PRIMARY KEY or a UNIQUE Constraint. Users can
change the default setting for PrimaryIndexDefault to P or N and not rely on the system default
setting which can possibly be changed in a future release.
To change from a NoPI table to PI table, users can run one of the following:
CREATE TABLE AS - The example shown below will create table Sales_Temp_PI with primary index
ItemNbr and all of the data from NoPI table Sales_Temp1.
CREATE MULTISET TABLE Sales_Temp_PI AS (SELECT * FROM Sales_Temp1) WITH DATA PRIMARY
INDEX (ItemNbr);
CREATE TABLE and INSERT-SELECT - The example shown below will create table Sales_Temp_PI
with primary index ItemNbr and then copy over all of the data from NoPI table Sales_Temp1.
CREATE MULTISET TABLE Sales_Temp_PI, FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
ItemNbr INTEGER NOT NULL,
SaleDate DATE FORMAT 'MM/DD/YYYY' NOT NULL,
ItemCount INTEGER)
PRIMARY INDEX (ItemNbr);
INSERT INTO Sales_Temp_PI SELECT * FROM Sales_Temp1;

Users can also change from a PI table to a NoPI table using one of the above approaches.

Rules and Limitations for NoPI Tables


The rules and limitations for NoPI tables are the same as those for primary-indexed tables with the
following exceptions:
You cannot create a NoPI queue table.
You cannot create a NoPI error table.
You cannot create a NoPI table as a SET table. The default table type for NoPI tables in both
Teradata and ANSI session modes is always MULTISET.
If neither PRIMARY INDEX (column_list) nor NO PRIMARY INDEX is specified explicitly in a CREATE
TABLE request, then whether the table is created with or without a primary index generally depends
on the setting of the DBS Control flag DisableNoPI
NoPI tables cannot specify a permanent journal.
NoPI tables cannot specify an identity column.
Hash indexes cannot be defined on NoPI tables because they inherit the primary index of their
underlying base table, and NoPI tables have no primary index.
SQL UPDATE triggers cannot update a NoPI table.
SQL UPDATE and UPDATE (Upsert Form) requests cannot update a NoPI target table. SQL UPDATE
and UPDATE (Upsert Form) requests can update a primary-indexed target table from a NoPI source
table.
SQL MERGE requests cannot update or insert into a NoPI target table.
SQL MERGE requests can update or insert into a primary-indexed target table from a NoPI source
table.
You cannot load rows into a NoPI table using the MultiLoad utility.
You can define all of the following commonly used table features for NoPI tables:

Fallback
Secondary indexes
Join indexes
UNIQUE column constraints
CHECK constraints
PRIMARY KEY and FOREIGN KEY constraints
Triggers
BLOB and CLOB columns.

You can define any of the following table types as NoPI tables:
Ordinary base data tables
Global temporary tables
Volatile tables
You cannot define any of the following table types as NoPI tables:

Queue tables
Error tables
Global temporary trace tables6

Manipulating NoPI Table Rows


After a NoPI staging table has been populated, you should run one of the following types of DML
request to move the NoPI staging table source rows to a primary-indexed target table:
INSERT SELECT
MERGE
UPDATE FROM
For example, you can populate data into a NoPI table via single-statement INSERT or multi-statement
INSERT. The latter can be done with TPump Array INSERT which is more efficient and also an
alternative to FastLoad for staging a table.
The example below shows a single-statement INSERT into NoPI table Sales_Temp1:
INSERT INTO Sales_Temp1 (100, '11/03/2008', 10);
INSERT-SELECT into a NoPI target table is also allowed. The source table can be a PI table or a NoPI
table.
INSERT INTO Sales_Temp1 SELECT * FROM Sales;

You can use the following DML statements to manipulate or retrieve NoPI table rows prior to
moving them to their primary-indexed target table:
DELETE
INSERT
SELECT
You cannot use UPDATE or UPDATE (Upsert Form) requests to change data in a NoPI table.

The following is a list of other operations and utilities and how they operate on a NoPI table:
Feature Impact

UPSERT
UPDATE
UPDATE
index in

- Not supported. This is a composite of UPDATE and INSERT operations such that if the
fails because the target row does not exist, the INSERT is automatically executed. The
part of an UPSERT is required to fully specify the primary index. Since there is no primary
a NoPI table, UPSERT is not supported.

MERGE-INTO - Not supported when the target table is NoPI. This is currently composed of three

variant forms UPSERT, UPDATE-only and INSERT-only. UPSERT is not supported for the same reason
discussed above. UPDATE is restricted in the initial release and therefore the UPDATE-only form of
MERGE-INTO is not supported. INSERT-only is also not supported but it is very much like INSERTSELECT which is supported.

MultiLoad - Not supported. FastLoad and TPump are utilities that are normally used to stage a table.
Restore/Copy Supported. The Restore/Copy processing on a NoPI table is very similar to that on a PI
table. The main difference is that a NoPI table normally has one hash bucket per AMP. This is like a
very non-unique NUPI table. So when the data from a NoPI table is restored or copied to a different
configuration with more AMPs, there may be some AMPs that will not have any data at all. So,
Restore/Copy can cause data skewing for a NoPI table.

FastExport - Supported. Data in a NoPI table can be FastExported the same way as for a PI table.

CheckTable - Supported. The LEVEL HASH check which is done with either an explicit LEVEL HASH
command or implicitly in a LEVEL THREE command works differently on a PI table and a NoPI table.
On a PI table, this check regenerates the rowhash for each data row based on the primary index and
then compares with the rows on disk. On a NoPI table, this check looks at the rowhash for each data
row and verifies that the hash bucket that is part of the rowhash correctly belongs to the AMP.

Table Rebuild - Supported. Table Rebuild processing on NoPI table is the same as that on a PI table.
The table has to have fallback protection for Table Rebuild to run. Rows in a NoPI table have rowhash
and therefore can be rebuilt the same way as those on a PI table.

Reconfig - Supported. The Reconfig processing on a NoPI table is very similar to that on a PI table.
The main difference is that a NoPI table normally has one hash bucket per AMP. This is like a very
skewed non-unique NUPI table. So when a system is reconfigured to have more AMPs, there may be
some AMPs that will not have any data at all for a NoPI table. So, Reconfig can cause data skewing for
NoPI table.

Home Posts ETL No Primary Index Tables on Teradata NoPi

No Primary Index Tables on Teradata NoPi


OCTOBER 18, 2014 BY ROLAND WENZLOFSKY6 COMMENTS
Last updated:Sunday, September 27, 2015
Our Reader Score
[Total: 17

Average: 4.7/5]

Share the joy

11

15
Shares

What are No Primary Index tables ?

Starting with Teradata Release 13, tables can be defined without having a Primary Index.
Nevertheless, the primary index still is the main idea to achieve an even data distribution on a
Teradata system. By design, the primary index ensures linear scalability.
Hence, the question is: What is the meaning of tables without a primary index, how are they
implemented, and how are they fiting into the hashing design of Teradata?
Initally, some words regarding how data is distributed in case of a No Primary Index (NOPI)
table.
Basically, rows are distributed randomly across the AMPs.
As no hashing takes place, but rows have to be uniquely identifiable, the ROWID is generated
differently than the ROWID of a regular primary index table:
Teradata uses the HASHBUCKET of the responsible AMP and adds a uniqueness value. The
bytes normally occupied by the hash value are being used to increase the range for generating
uniqueness values.
This is how No Primary Index Tables are created:
CREATE TABLE <TABLE> (
PK INTEGER NOT NULL
) NO PRIMARY INDEX ;

Be aware, if neither PRIMARY INDEX nor NO PRIMARY INDEX is defined, depending from
the DBS control field PrimaryIndexDefault, the result may be a table without primary index (NO
PRIMARY INDEX, PrimaryIndexDefault=P,D) or the first not nullable column of the table
definition will be used as the PRIMARY INDEX (in case all columns are nullable, the first
nullable column of the CREATE TABLE statement will be used as the PRIMARY INDEX
(PrimaryIndexDefault=N).

How No Primary Index Tables work

As No Primary Index tables are distributed randomly across the AMPs, bulk loading of NOPI
tables (Fastloads, TPUMP array insert loads) is faster than loading tables with a Primary Index.
Let me describe the positive performance impact on the technical phases of a FastLoad:
1. Incoming rows are distributed in a round-robin fashion randomly across all AMPs
2. The rows are hashed by the primary index value and forwarded to the responsible AMPs
3. The responsible AMPs sort the received rows by ROWID
Let us consider a No Primary Index table: After distributing the rows randomly across the AMPs
we are already finished. No hashing and redistribution is needed. No sorting is needed.
Furthermore, as the rows are assigned randomly to the AMPs, our data will always be distributed
evenly across all AMPs and no skewing will occur.
This speeds up the load process, as only the acquisition phase of the loading utilities is executed.
Another advantage of NOPI tables is, that the rows are always appended to the end of the tables
data blocks, thus any overhead usually caused by sorting the rows by rowhash into the data
blocks is avoided. For example, in case you INSERTSELECT huge amounts of rows into
your NOPI table, above described effect will reduce the number of IOs significantly.
No Primary Index Tables and Skewing

NOPI tables will not become skewed, with a few exceptions:

Bulk loaded small tables will always be skewed. Round robin is done on block
level and not the row level. Some AMPs will receive data blocks and the rest
will receive no data.

If you execute an INSERTSELECT statement, from a primary index table into


a NOPI table, AMP-local copying of the rows will be applied. This means,
all rows stay on their current AMP, and if the primary index table is skewed
you, will end up with a skewed NOPI table.

Skewing can happen, by executing an archive job on a source system, and


a restore of the backup on a different target system.
The Usage of No Primary Index Tables

NOPI tables are useful in certain situations, but without a Primary Index, row access is limited to
All-AMP full tables scans. In order to offset this impact, you could add an unique secondary
index (USI) to your NOPI table.
There are common features used by NOPI tables and the Teradata Columnar implementation,
introduced with Teradata 14: Column partitioned tables are always NOPI tables, and although
offering great performance improvements for certain workload types, they lack as well the fast
primary index access.
To some extend, the mentioned disadvantage of no primary index tables can be compensated
with join indexes or secondary indexes.
No Primary Index tables should not be used as production tables, mainly consider using them
during the ETL-Process, in situations when anyway a full table scan is required.
NOPI tables come with some further restictions, find below the most important ones:
Only MULTISET tables can be created
No identity columns can be used
NoPi tables cannot be partitioned with a PPI
No statements with an update character allowed (UPDATE,MERGE INTO,UPSERT), still you
can use INSERT,DELETE and SELECT
No Permanent Journal possible
Cannot be defined as Queue Tables
No Queue Tables allowed
Update Triggers cannot update a NOPI table (probably introduces with a later release)
No hash indexes are allowed (use join indexes instead)
The following features can be used as usual together with NOPI tables:
Fallback protection of the table
Secondary Indexes (USI, NUSI)
Join Indexes
CHECK and UNIQUE constraints
Triggers
Collection of statistics
The information about NOPI tables, as in the case of any table type, is stored in DBC.Tables.
The tablekind is O:

SELECT * FROM DBC.TABLES WHERE TABLEKIND = O;


No Primary Index Tables on Teradata NoPi written by Roland Wenzlofsky average
rating 4.7/5 - 17 user ratings

ndary Index in Teradata


Teradata Utilities FASTLOAD part 2
Oct

06
Teradata Utilities FASTLOAD part 1

TD Utilities

by admin

Introduction
Fastload is the Teradata utility which can be used to load large amount of data in empty table on a Teradata System.
The property which makes Fastload UNIQUE is its fast speed to load huge amount of data in tables.

You can load data from

Disk or Tape drives from the channel attached client system.

File data on a network attached workstation.

Any other device which can provide the properly formatted source data

Fastload uses multiple sessions and block level operation to load data into table that is why its much faster than any
other loading utilities.
TIP However, one Fastload job loads data into one table only. If you want to load data into more than one
table , you must submit multiple Fastload jobsone for each table.
Three requirements for Fastload to execute

LOG TABLE This table is used to record all the progress during execution of Fastload. This table exists in
the SYSADMIN database under the name of FASTLOG. The structure of the FASTLOG table is as follows

EMPTY TABLE Fastload needs the table to be empty before inserting rows into that table. It does not care how this
is implemented but it requires empty tables without any prior records. If the table is not empty then Fastload will
pause and show the error.
TWO ERROR TABLES Fastload require two error tables to catch any error which occurs during the execution of
the Fastload. It create the error tables itself. Each error tables record error of specific type.
The first error table record any translation or constraint violation error. For e.g. if a row has data which does not
match with the data type of its particular column then this row will be captured by error table 1. Insertion of CHAR
data in INTEGER data type column.
The second error table captured any error which is related to duplication of values for Unique Primary Indexes
(UPI). Fastload will capture only one occurrence of the value and store the duplicate occurrence in the second error
table. However if the entire row is duplicated then Fastload count it but does not store the row.
These tables are analyzed later for error handling.
Data Conversion Capabilities

Fastload also allow the data conversion on the column value before inserting it into target table. Following are the
conversion allowed by Fastload.

Numeric-to-numeric (for example integer-to-decimal)

Character-to-numeric

Character-to-date

Date-to-character

TIP Only one conversion are allowed per column by Fastload.

To know more about Fastload check the below mentioned links


ERRLIMIT
Specifies the maximum number of rejected ROWS allowed in error table 1 (Phase I).
LOGON/LOGOFF
or, QUIT

LOGON begins a session. LOGOFF ends a session. QUIT is the same as LOGOFF.

NOTIFY

NOTIFY command used to inform the job that follows that some event has occurred. It is
often used for detailed reporting on the Fastload jobs success.

RECORD

Specifies the beginning record number (or with THRU, the ending record number) of the
Input data source, to be read by Fastload. Syntactically, This command is placed before the
INSERT keyword.

SET RECORD

Used only in the LAN environment, this command states in what format the data from the
Input file is coming: Fastload, Unformatted, Binary, Text, or Variable Text. The default is the
Teradata RDBMS standard, Fastload.

SESSIONS

This command specifies the number of FastLoad sessions to establish with Teradata. It is
written in the script just before the logon. The default is 1 session per available AMP.

TENACITY

Suppose Fastload is not able to obtain all the session required by it. In that case only two
options are left. Either terminates the Fastload or re try to obtain all the sessions.TENACITY
specifies the amount of time, in hours; Fastload will re try to obtain the sessions. The default
for Fastload is no tenacity, meaning that it will not retry at all.

SLEEP

Working in conjunction with TENACITY, the SLEEP command specifies the amount of time in
minutes to wait before retrying to obtain all sessions.For example, suppose if TENACITY
were set at 5 and SLEEP at 10, then Fastload would attempt to logon every 10 minutes for
up to 5 hours. If there were no success by that time, all efforts to logon would cease.

You might also like