0% found this document useful (0 votes)
21 views

Teradata Vantage SQL Basics 2

Uploaded by

Justin Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Teradata Vantage SQL Basics 2

Uploaded by

Justin Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 62

Guaranteeing Uniqueness

When joining a many-to-many relationship, unintended result rows can be projected.


Example: Here the join is based on matching column names, but not on the actual
PK/FK relationship.
Remember: For each row in the first table all matching rows in the second table are
returned, producing an unexpected result!

Notes
One of the most important differences between joins and subqueries is the need for
establishing a one-to-many relationship in a join, something that is automatically provided
when writing a subquery.
This example illustrates what happens when the join relationship is many-to-many. In such
a case, unintended result rows appear in the final result set. In the example, employee 1015
works only in department 501, but since the join is on the manager number, this result set
will show that person working in both department 501 and department 402. Certainly not
the real circumstance. Other employees (employees 1018, 1023, 1006, 1008) share the same
fate. It would be difficult, if not impossible, to view the result set and know who truly works
in which department.

NOT IN vs. Inner Join


● Remember that every IN-Subquery can be rewritten as an Inner Join based on
equality
● But NOT IN is a condition based on non-equality
Notes
There are multiple ways to rewrite NOT IN, the best choice to find rows in table A where
matching data doesn't exist in table B is a NOT EXISTS Correlated Subquery: (Covered in a
later module)

Cross Join
A CROSS JOIN is a join where no join condition is specified. Since no qualification exists, the
database establishes a pseudo-condition of “WHERE 1=1”, which is true for each and every
comparison. An example is illustrated below demonstrating Cross Join.

Notes
CROSS JOIN is a rarely used syntax. In our example, the one on the left is preferred
because it shows the reader that a cross join is intended, while the one on the right may or
may not be (perhaps the writer forgot the join condition).
Since no join condition exists, the database invents one for us, whether we are pleased
with it or not. Explain shows a condition of “1=1”, which always evaluates true. Thus, you
can read the row for employee “Short” as “Project the employee’s number, last name, and
department name for each row in the department table where 1=1 is true.” The result is to
project these column values (from the “Smith” row) for each department row. The same
thing happens all over again for each employee row.

4) We do an all-AMPs RETRIEVE step in TD_MAP1 from EMPLOYEE_SALES.d by


way of an all-rows scan with no residual conditions into Spool 2 (
all_amps), which is duplicated on all AMPs in TD_Map1. The size of
Spool 2 is estimated with high confidence to be 18 rows (270 bytes).
The estimated time for this step is 0.00 seconds.
5) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by way
of an all-rows scan, which is joined to EMPLOYEE_SALES.e by way of an
all-rows scan with no residual conditions. Spool 2 and
EMPLOYEE_SALES.e are joined using a product join, with a join
condition of ("(1=1)"). The result goes into Spool 1 (group_amps),
which is built locally on the AMPs. The size of Spool 1 is estimated
with high confidence to be 234 rows (15,678 bytes). The estimated
time for this step is 0.01 seconds.
Cartesian Products
A completely unconstrained cross join is called a Cartesian product. It results when a CROSS JOIN is

issued without a WHERE clause. In this case, each row of one table is joined to each row of another

table. The output of a Cartesian product is often not meaningful however they do have useful

application. Any query requiring all elements of one table to be compared to all elements of another is

a candidate for a Cartesian product using the cross join syntax.


Notes
The number of rows in the result set is the number of rows in the left table times the
number of rows in the right table, Explain shows a “Product Join”. An 8,000,000-row table
and a 50,000 row table would yield a 400,000,000,000 row answer set.
Things to Notice:
Cartesian products often result accidentally from an inner join with improper aliasing or
missing join conditions. Explaining the Select can easily reveal this because the optimizer
will do this product join very likely as the final step with a very high estimated number of
rows and runtime.
Real-World Uses for Cartesian Product Joins
One important use is to generate a very large answer set. This can allow you to benchmark
system performance with large data throughputs. Cross joins are also used to create all
possible combinations of values in two tables/columns for further processing, e.g.,
month/product, even if a product was not sold in a specific month
Another use is for an airline to compare all possible combinations of cities in order to
evaluate: flight plan information, passenger rate structures, and mileage awards. For this
you would use a cross join.
Mistakes on Table Aliasing
Be careful! Do not alias a table and then use the table name instead of the alias.
A few examples are illustrated below. In the examples, only the first one will fail due to a syntax error.

All others will cause a three-table join: An Inner Join between dept and emp followed by a Cross Join

to employee!

So, Why a Cross Join?


In order to understand this, you need the execute the following query:

Notes

A table-alias is not really an alias, it replaces the tablename within that query. When using

aliases in writing joins, one must be careful to always use alias names when referencing,

and not use aliased table names. In our examples, the table Employee has been aliased as

“emp” in the FROM, but the join condition references Employee as a table name and does

not reference the alias. In the first example SQL-92 join syntax requires a join condition

from the joined tables and thus the query fails with "3782: Improper column reference in

the search condition of a joined table". But the SQL-89 syntax is interpreted as having three

tables being joined, namely: “Emp,” “Dept,” and “Department.”


To understand what is happening one must realize that the FROM clause is not required in

a Teradata SQL request (but in Standard SQL). Teradata was implemented before there was

Standard SQL, the initial query language was called TEQUEL (TEradata QUEry Language),

whose syntax didn't require to list tables within FROM.

RETRIEVE employee.last_name contains enough information for the Parser to resolve table

& column name and returns

last_name

----------------------------------------

Hopkins

Ratzlaff

Rogers

Rogers

Kanieski

Crane

Stein

Johnson

Short

Brown

...

Planning n-way Joins


If more than two relations are specified in a join, the Optimizer reduces that join to a series of binary

joins and then attempts to determine the most cost effective join order. Because the Optimizer uses

column statistics to choose the least costly join plan from the candidate set it generates, the plans it

generates are extremely dependent on the accuracy of those numbers.

Column projection and row selection are done prior to doing the join.
n-way joins are reduced to a series of binary joins.

Query optimizers use trees to build and analyze optimal join orders, most common are:

● Left-deep Search Tree:

○ Number of join orders = n!

● Bushy Search Tree:

○ Number of join orders = (2n-2)!/(n-1)

Notes

Join Order Search Trees

Query optimizers use trees to build and analyze optimal join orders. The join

search tree types used most frequently by relational database optimizers are the

left-deep tree and the bushy tree.

When a left-deep search tree is used to analyze possible join orders, the number

of possibilities produced is relatively small.

Bushy trees are an optimal method for generating more join order combinations.

At the same time, the number of combinations generated can be prohibitive, so


the Optimizer uses several heuristics to prune the search space. This method

produces an order of magnitude more join possibilities that can be evaluated

than the left-deep tree method. Bushy trees also provide the capability of

performing some joins in parallel.

The Optimizer uses various combinations of join plan search trees, sometimes

mixing left-deep, bushy, and even right-deep branches within the same tree.

Possible Join Orders as a Function of the Number of Relations To Be Joined

The possibilities for ordering binary joins escalate rapidly as the total number of

relations joined increases. The Optimizer is very intelligent about looking for

plans and uses numerous field-proven heuristics to ensure that more costly

plans are eliminated from consideration early in the costing process in order to

optimize the join plan search space.

Summary
● Column values may be projected from any table of a join

● Subqueries and inner joins can both return inner result sets

● Inner joins have both an implicit form and an explicit form

● Inner joins typically involve one-to-many relationships based on equality

● A table may be joined to itself

● Inner joins cannot return outer (NOT IN) result sets as can subqueries

● Incorrect table and column references can cause incorrect result sets

💻 Labs: SQL Inner Joins


Try completing the labs in the AWS lab environment. When you get the desired results, compare your

code with the code under the Solution tabs.

LAB 1
Select all currently active (legacy_flag = 0) jobs with a job_code in the

33x.xxx range with assigned (= exist in hr_payroll) active HELP TABLE br_payroll;

(hire_end_date IS NULL) employees. HELP TABLE br_jobs;

Order the result set by job_code.

SOLUTION
br_jobs.job_code
is unique, but one row per matching employee is returned, resulting in duplicate rows DI

(A subquery is doing this automatically)


LAB 2

Select all active (hire_end_date IS NULL) employees with an


annual_salary less than 80,000 whose job_title contains
HELP TABLE br_payroll;
administrator.
HELP TABLE br_jobs;
Concatenate first_name and last_name and alias it to
fullname.
Order the result set by birthday within year.

SOLUTION
No DISTINCT needed as br_payroll.job_code is non-unique.
LAB 3

Write a query to return those active


employees (active = hire_end_date is
null) with a higher annual_salary than
the manager of their department.
HELP TABLE br_payroll;
Show name and salary for both HELP TABLE br_departments;
employee and manager and calculate
their salary difference.

Order the result set by descending


difference.
SOLUTION
br_payroll is used twice, similar to a self-join.

LAB 4
Select all districts with less than
60,000 inhabitants
(num_inhabitants) having
HELP TABLE fin_district;
accounts with a loan at status
HELP TABLE fin_account;
'D' (= running contract, client in
HELP TABLE fin_loan;
debt).

Order the result set by


district_name.

SOLUTION
Again, DISTINCT needed to remove duplicate rows.
Lesson 12 of 22
Outer Joins
At the end of this lesson, you will be able to:

Distinguish between INNER and OUTER joins


Distinguish between LEFT, RIGHT, and FULL outer join results
Determine the effect of Nulls in the source tables on result rows
Determine the effect of Nulls in the result rows
Use correct terminology when referring to outer join syntax
Identify various ways of returning the same outer join results
Outer join three or more tables together
Rewrite NOT IN using a LEFT JOIN

What is an Outer Join?


Refer to the example below to learn more.
Outer Join Syntax

The outer join is an extension of the inner join. The difference between outer joins and
inner joins is the existence of a nonmatching row set in the outer join result.
Outer joins of two or more tables perform an inner join of those tables according to a
specified join condition and also return rows from the left join table, the right join table, or
both, that do not match the inner join condition, extending the results rows with nulls in the
nonmatching fields.
The non-matching rows are for those rows in the outer table that do not have a match in the
inner table, while the matching rows in the outer join result are the rows that satisfy the join
condition, which is exactly the same as those in the inner join result.
Inner Table and Outer Table
Inner Table
The inner table is the table that contributes only matched rows to the outer join result. For
an inner join, both tables are inner tables.
Making It Real
The outer table is the table that contributes unmatched rows to the outer join result. In this
description, the word "unmatched" refers to the rows in the left or right (or both) table that
are not part of the inner join rows because there are no matching columns, so the rows are
extended with NULLs in the results table.
Example

Below are the two syntax examples, showing Explicit Join syntax only and the way of using
the "Outer" keyword for Outer Join respectively.
EXPLICIT JOIN SYNTAX ONLY
There's no implicit syntax for outer joins.

"OUTER" KEYWORD
Just as the INNER keyword for Inner Joins, the OUTER keyword is optional.

Any additional restriction in the ON clause is applied as a residual condition to the join. Any
restriction in the WHERE clause is applied to the table that is the final result of the outer join.
Left Outer Join
Follow the example below to learn about the Left Outer Join. The table to the LEFT of the JOIN
keyword is the OUTER table.

Using the query shown here, you can observe how Left Outer Join works, in the animation as
below.
Notes
There are several situations that can add complexity to understanding outer join results. Here we
see how introducing a Null value into the inner table can potentially provide a certain level of
confusion.
In the result set a Null is returned for an inner table column for an inner result and does not
indicate an outer result! If the department name column was defined as NOT NULL, then a Null
value for the department name could not exist, and so a Null for this column in the result set can
now be interpreted as an outer result row. Always check the optimizer plan if it actually shows an
Outer Join:

Right Outer Join


Follow the example below to learn about the Right Outer Join. The table to the RIGHT of the JOIN
keyword is the OUTER table.

Using the query shown here, you can observe how Right Outer Join works, in the animation as
below.
Notes
Be aware that even with an explicit RIGHT join Explain might show a LEFT join simply switching
the tables:
Full Outer Join
Follow the example below to learn about the Full Outer Join. Tables to the LEFT and the RIGHT of
the JOIN keyword are both OUTER tables.

Using the query shown here, you can observe how Full Outer Join works, in the animation as
below.

Notes
Join Types and Join Order

Outer Joins might return different results based on the join types and the order of joins:
The logical join order is determined by the sequence of ONs
Each JOIN creates an intermediate result which is subsequently JOINed to the next table
Two tables A and B can be joined in 4 variations:
Each additional join quadruples the number of possible variations, but not all of them
actually provide logically different results.

The optimizer is quite smart simplifying an Outer Join. If it doesn't change the result set:
Full → Left/Right
Left/Right → Inner
Always check if Explain shows the expected join type.

Notes
Temporary Derived Tables
When joining three or more tables, the join process logically creates a temporary derived table that
is defined only for the duration of the SELECT operation. Depending on various factors, the
Optimizer might rewrite a request in such a way that it generates a query plan that is very different
from the 1:1 correspondence between the coded query and its actual implementation that this
simplified description implies. Such a temporary table is also called a joined table. Vantage refers
to this kind of temporary table as a spool.
ON Clause Evaluation Order
The result of an inner join of two tables is not changed if rows from table a are joined to rows from
table b, or if rows from table b are joined to rows from table a. In terms of set algebra, inner joins
are both commutative and associative. This is true no matter how many tables are (inner) joined.
Because inner joins are both commutative and associative, the Optimizer can select the best join
order for inner joins arbitrarily and the end result is always identical. Outer joins, on the other
hand, are rarely either commutative or associative. The Optimizer cannot select an arbitrary best
join order for outer joins because neither commutativity nor associativity can be assumed, nor
does it have any way to know what specific result you intended to produce with the query you
presented to it.
To outer join three tables, you must specify their join order explicitly by placing the ON clause in
an appropriate position within the FROM clause to ensure that the join is evaluated correctly.
The Optimizer follows these rules to generate outer joins.
The first ON clause in the query (reading from left to right) is evaluated first
Any ON clause applies to its immediately preceding join operation
Adding a Third Table
Below is the representation of adding a third table.

Notes
Guidelines for Using Outer Joins

Outer joins, when used properly, provide additional information from a single query
that would otherwise require multiple queries and steps to achieve. However, the
proper use of outer joins requires training and experience because “common sense”
reasoning does not always apply to formulating an outer join query in a manner that is
not only syntactically correct but also returns an answer that is correct for the
business question.

To ensure the correct answers to your business questions using outer joins, refer to
these steps.
1. Understand the question you are trying to answer and know the
demographics of your tables.
Form a hypothesis of the answer set you to expect before you begin
2. Write the query, keeping in mind the proper placement of join and
search conditions

Condition Clause
Join ON
Search condition predicates for inner ON
table
Search condition predicates for outer WHERE
table
3. Always EXPLAIN the query before performing it Look for the words
outer join in the EXPLAIN text. If the EXPLAIN text does not include
outer join, the Optimizer did not produce an outer join plan
4. Perform the query and compare the result with your expectations

Coding Search Conditions for Outer Joins

An example of coding search conditions for outer joins is shown below. In this
example:

 Any restriction in the WHERE clause is applied to the table that is the final result
of the outer join
 Any additional restriction in the ON clause is applied as a residual condition to
the join

Recommendations:

 Search conditions for OUTER table are added to WHERE


 Search conditions for INNER table are added to ON

Notes
Coding ON Clauses With WHERE Clauses for Outer Joins

To limit the number of qualifying rows in the outer table (and therefore the answer
set), the search condition for the outer table must be in the WHERE clause. Note that
the Optimizer logically applies the WHERE clause condition only after a join has been
produced. The actual application of conditions always depends on how the Optimizer
chooses to implement the query.

A search condition in the ON clause of the inner table does not limit the number of
rows in the answer set. It defines the rows that are eligible to take part in the match
to the outer table.

If a search condition on the inner table is placed in the WHERE clause, the join is
logically equivalent to an inner join, even if you explicitly specify the keywords
LEFT/RIGHT/FULL OUTER JOIN in the query. The Optimizer always treats such a join as
an inner join to simplify the query, rewriting it to roll the entire complex process into a
single step.

NOT IN vs. Outer Join


An example of NOT IN vs. Outer Join is shown below.
Notes
While this technique is safer and might perform better than NOT IN if columns are NULLable, the
best choice to find rows in table A where matching data doesn't exist in table B is a NOT EXISTS
Correlated Subquery:

Summary

 Outer joins return both inner and outer result sets


 Teradata uses only the explicit ANSI standard syntax for outer join and requires
the use of an “ON” clause to establish a join condition
 Inner and Outer joins may be used together in a single query
 There are many ways one can write outer joins to achieve the same result
 NOT IN can be rewritten using an Outer Join

💻 Labs: Outer Join


Try completing the labs in the AWS lab environment. When you get the desired
results, compare your code with the code under the Solution tabs.
LAB 1

Select all currently active (legacy_flag = 0) jobs with


a job_code in the 33x.xxx range and assigned employees
HELP
(assigned = exist in br_payroll).
TABLE br_payroll;
Also include jobs without assigned employees.
HELP TABLE br_jobs;
Order the result set by job_code.

SOLUTION

No search condition on the employee table ⇒ no join needed ⇒ a select on the job
table does the job, too. Optimizer removes the Outer Join automatically ⇒ Join
Elimination.

Caution: It will not remove the join if p.job_code is used in any place or the DISTINCT
is missing.

Notes - Lab 1
Eliminating Redundant Joins
Joins are among the most frequently specified operations in SQL queries. They are
also among the most demanding resource consumers in the palette of an SQL coder.
Join elimination removes unnecessary tables from queries.
Join elimination is most commonly used in the following situations:
 Inner joins based on any form of referential integrity between 2 tables
 Left and right outer joins can be eliminated if the join is based on
unique columns from the right table in the join specification

In both cases, a join and a table (the table eliminated is the parent table in inner joins
and the inner table in outer joins) are removed from the query, assuming that no
projections are needed from the eliminated table to keep the query whole.

LAB 2

Write a query to find jobs which are currently active


(legacy_flag = 0) and open, i.e. jobs without any assigned
HELP
employees.
TABLE br_payroll;
HELP TABLE br_jobs;
List the job_code and job_title.
Order the result set by job_title.

SOLUTION
Notes - Lab 2

LAB 3
Select all active (i.e. hire_end_date is NULL) employees HELP
with an annual_salary over 120,000. TABLE br_payroll;
HELP
If the budget_amount of the employee's department is TABLE br_departmen
less than 20,000,000 include the budget and calculate the ts;
percentage each salary represents of the departmental
budget, otherwise return a NULL.
Order by last_name.

Solution:
In decimal calculations the result is rounded after
each step. ⇒ Order of calculation is important ⇒
Multiply first, then divide.

Notes - Lab 3
Lesson 13 of 22

EXISTS Subqueries
🎯 Objectives
At the end of this lesson, you will be able to:

 Use EXISTS and NOT EXISTS with correlated subqueries


 Use EXISTS and NOT EXISTS to check for empty tables
 Contrast subqueries with correlated subqueries

EXISTS Predicate

Let's understand the EXISTS Predicate with an example.


Show jobs with a salary less than 30,000.
The query for this problem statement is shown below with an explanation
comparing IN subquery and EXISTS subquery.

 An IN (shown in 1st query) subquery returns rows where a matching value is


found in the list of values returned by the sub-select
 An EXISTS (shown in 2nd query) subquery returns rows where a row with
a matching value exists in the sub-select
 This is just a different way to phrase the same logic
o IN and EXISTS are always equivalent
o Explain plan is similar to IN subquery
 Typically used as Correlated Subquery
o Added WHERE-condition in the subquery "correlates" the inner and the
outer select via a join-like condition
o It's not possible to execute the subquery only:
3807 Object 'j' does not exist

Notes
EXISTS/NOT EXISTS

Tests a specified table (normally a derived table) for the existence of at least one row
(that is, it tests whether the table in question is non-empty).

If execution of the subquery returns response rows then the where condition is
considered satisfied. Using the NOT qualifier for the EXISTS predicate reverses the
sense of the test. Execution of the subquery does not, in fact, return any response
rows. Instead, it returns a boolean result to indicate whether responses would or
would not have been returned had they been requested.

EXISTS is supported as the predicate of the search condition in a WHERE clause.


The EXISTS subquery will usually contain a reference to a column value from the outer
select. This means the inner query cannot be executed independently.
Explaining EXISTS

EXISTS is rewritten by the optimizer as a join equivalent to

A similar rewrite is done for an IN, as shown below.


Notes
Partial explain for IN-subquery:
NOT EXISTS Predicate
Let's understand the NOT EXISTS Predicate with an example.
Show jobs with a salary not less than 30,000
The query for this problem statement is shown below with an explanation
comparing IN subquery and NOT EXISTS subquery.
 NOT EXISTS and NOT IN are, unlike EXISTS and IN, not equivalent in all cases
 Remember that based on 3-valued logic NOT IN will return no row when the
subquery returns a NULL
o Must add WHERE job_code IS NOT NULL
o But Explain plan might still be overly complex
 In contrast [NOT] EXISTS is based on
2-valued logic that Never returns UNKNOWN
 Logic is not based on returned values but returned rows
o EXISTS returns TRUE if the subquery returns at least one row
o NOT EXISTS returns TRUE if the subquery returns no row

Notes
The result of SQL Exists is a Boolean value TRUE or FALSE, if the subquery returns one
or more records it returns TRUE otherwise it returns FALSE. But EXISTS never returns
UNKNOWN.

EXISTS Predicate Versus NOT IN and Nulls

Use the NOT EXISTS predicate instead of NOT IN if the following conditions are true:
 Some column of the NOT IN condition is defined as nullable
 Any rows from the main query with a null in any column of the NOT IN
condition should always be returned
 Any nulls returned in the select list of the subquery should not prevent
any rows from the main query from being returned

Refresher: Dealing with NULLs


SQL defines a three-valued logic based on TRUE, FALSE and UNKNOWN.

WHERE-conditions must evaluate TRUE to return a row.

The two-valued logic of [NOT] EXISTS simply treats UNKNOWN comparisons


as FALSE.
Notes
The above topic discusses what amounts to Truth Table Logic. With normal binary
logic, we only have to consider TRUE or FALSE evaluations. With relational databases,
there are three possibilities to consider; TRUE, FALSE, and UNKNOWN. Anytime a null
gets involved in a comparison of any kind, the result is unknown. This also means that
null can never equal itself (evaluate true) since an unknown can never equal another
unknown.

In relational databases, to say that a row wasn’t projected because the condition
evaluated FALSE is incorrect. We say that it didn’t project because it didn’t evaluate
TRUE. This is a significant statement! It will cause many a quandary with respect to
writing bug-free SQL.
Explaining NOT IN

NULLable columns result in a complex plan dealing with NULLs.

An example is shown below.


Explaining NOT EXISTS
Independent of NULLable columns NOT EXISTS shows the same plan as EXISTS, just
the join type changes to from inclusion join to exclusion join.

Both exclusion join and inclusion join can't be expressed using SQL join
syntax.

 The exclusion join is the internal implementation of Relational


Algebra's anti-join, returning rows from the first table that do not have
at least one matching row in the second table
 The inclusion join is the internal implementation of Relational
Algebra's semi-join, returning rows from the first table that do have at
least one matching row in the second table

Notes
Definition of the Inclusion Join
An inclusion join is a join where the first right table row that matches the left row is
joined.

Definition of the Exclusion Join


Exclusion join is a join where only the rows that do not satisfy any condition specified
in the request are joined. In other words, exclusion join finds rows in the first table
that do not have a matching row in the second table.

Exclusion join is an implicit form of the outer join.


Performance

There are two types of performances: EXISTS performance and NOT EXISTS
performance.

EXISTS Performance: Always similar plan to IN subquery


NOT EXISTS Performance:

 Similar plan to NOT IN subquery only if columns are defined as NOT NULL
o Adding a WHERE ... IS NOT NULL ensures the correct result, but
the plan might still be complex
 Outperforms NOT IN if a comparison is based on NULLable columns
 Usually at least equal to a LEFT JOIN plus WHERE IS NULL, but might be
way better (depending on the actual data)

Non-Correlated EXISTS

Non-Correlated EXISTS can be used to determine if a table is empty. An example is


shown below.

Commonly used in batch scripts to stop processing, e.g. in BTEQ. An example is


shown below.
.IF activitycount = 1 THEN .LOGOFF;
Notes
The fact that the outer query has no FROM clause is an indication that we are only
interested in satisfying a condition, namely, whether any employee exists. Partial
Explain EXISTS:
SELECT * vs. SELECT 1
The following examples demonstrate the comparison between SELECT * vs. SELECT
1.
The common best practice is to avoid

SELECT * in a production query

 But EXISTS looks for rows instead of columns, ignoring all expressions in the
select list
 Even invalid expressions don't fail

Using * for EXISTS is safe, but still not recommended

 Parser needs to resolve metadata for all columns (before it's discarded again by
the optimizer)
o Minimal overhead
o Might fail if column-level access rights exist
 EXISTS in a view is resolved to the full list of columns and will fail if a column is
dropped from the base table

Notes
According to Standard SQL the select list of an EXISTS subquery is equivalent to an
arbitrary literal value.

You can always see a .SQL-file how SELECT * is expanded to a column list.

Note: This is related to the .SQL file with all the code shown in this module in it.
These files will be distributed as part of the training.
Summary

 EXISTS and NOT EXISTS may be referenced for checking if certain conditions
exist or not
 EXISTS might be correlated or non-correlated subqueries
 NOT EXISTS may outperform NOT IN subqueries

💻 Labs: EXISTS Subqueries


Try completing the labs in the AWS lab environment. When you get the desired
results, compare your code with the code under the Solution tabs.
LAB 1

Select all currently active (legacy_flag = 0) jobs with


a job_code in the 33x.xxx range with assigned active HELP
(hire_end_date is null) employees. TABLE hr_payroll;
HELP TABLE hr_jobs;
Order the result set by job_code.
SOLUTION
Condition on job_code can be written in the inner or the outer select.
Optimizer will automatically apply the condition in both places ⇒ same Explain.

LAB 2

Select all active (hire_end_date IS NULL) employees with


an annual_salary less than 80,000 whose job_title
contains administrator.
HELP
Concatenate first_name and last_name and alias it to TABLE hr_payroll;
fullname. HELP TABLE hr_jobs;

Order the result set by birthday within year.


SOLUTION

LAB 3

Write a query to find jobs which are currently active HELP


(legacy_flag = 0) TABLE hr_payroll;
and open, i.e. jobs without any assigned active HELP TABLE hr_jobs;
(hire_end_date IS NULL) employees.

List the job_code and job_title.


Order the result set by job_title.
SOL

SOLUTION
The Aggregate Functions

Aggregate functions produce an arithmetic summarization of values.

 Operate on a group of rows and return a single numeric value for each group
 Do not provide row-level detail

An example is shown below.

The data on the right shows the rows in the base table before aggregation.
Aggregates ignore NULLs in all computations.

 ANSI mode session returns Warning 2892: Null value eliminated in set function.
 Caution: If NULLs exist the result of SUM(A) + SUM(B) ≠ SUM(A+B).

Notes
Aggregate functions perform operations that summarize information found in tables,
i.e., one can expect that a certain amount of detailed information will be lost when
performing aggregations. In our first example, we see that a single row is returned,
representing the minimum, maximum, sum, and average of all salary amounts for all
employees and a number of rows with a known salary amount. COUNT(*) provides us
with a mechanism that can be used to count actual rows. It doesn’t matter if NULLs
appear for each and every column value in a row, it still gets counted!

Aggregates and Nulls

Aggregates (with the exception of COUNT(*)) ignore nulls in all computations. Ignoring
nulls can result in apparent nontransitive anomalies. For example, if there are nulls in
either column A or column B (or both), then the following expression is virtually
always true:

SUM(A) + SUM(B) <> SUM(A+B)

The only exception to this is the case in which the values for columns A and B are
both null in the same rows because in those cases the entire row is disregarded in the
aggregation. This is a trivial case that does not violate the general rule.

More formally stated, if and only if field A and field B are both null for every
occurrence of a null in either field is the above inequality false.

There are several ways to work around this apparent non-transitivity issue if it
presents a problem. Either solution provides the same consistent results.
 Always define your numeric columns as NOT NULL
 Use a COALESCE expression on every NULLable column within the
aggregate function to convert any nulls to zeros for the computation,
for example

SUM(COALESCE(x, 0) + COALESCE(y, 0))


The GROUP BY Clause

The GROUP BY clause allows groups to be defined.

 Aggregate operations are performed on these group


 GROUP BY may reference columns by name, alias, or by positional value (like
ORDER BY)
 Grouping does not imply ordering, but the result might be sorted (based on the
optimizer plan)
 All NULLs form one group

An example is shown below.

3504 Selected non-aggregate values must be part of the associated group.

 Whenever both aggregate and non-aggregate columns are selected,


GROUP BY is required
 All projected non-aggregate columns must appear in the GROUP BY
clause
 You may group by column(s) that are not projected

Notes
The GROUP BY clause permits the definition of groups within a SELECT statement.
Aggregate functions may then be performed on each group, with a resulting line of
output to represent each group. Detail rows of information are unavailable when using
an aggregate function, either with or without a GROUP BY clause.

The maximum number of columns or expressions you can specify in a GROUP BY is


64. Similar to ORDER BY a column might be referenced by name or its sequential
numeric position within the SELECT list.

Aggregates and GROUP BY

If you use an aggregate function in the select list of an SQL statement, then either all
other columns occurring in the select list must also be referenced by means of
aggregate functions or their column name must appear in a GROUP BY clause. The
reason for this is that aggregates return only one value, while a non-GROUP BY
column reference can return any number of values.
The HAVING Clause

You can use the HAVING clause for qualifying on an aggregated value.

 HAVING is just like WHERE, except that it applies to groups rather than rows

Let's see the difference between the WHERE clause and HAVING clause.

WHERE Clause
The WHERE clause must reference a non-aggregated
value.

 No aggregated data, yet


 Any column allowed
HAVING Clause
The HAVING clause should reference an aggregated value.

 No more details anymore


 After aggregation only columns from GROUP BY and aggregated columns
exist
 Conditions on GROUP BY-columns are possible, but should better be
placed in WHERE

An example is shown below.


Logical Query Processing Order

There are two types of order.

1. Typed Order
2. Logical Order

Let's see the differences between them in detail.


Notes
Logical Query Processing

Each select clause defines what is available to the clauses in subsequent steps and
transforms the data step-by-step from the input tables into the final result set. And
because Standard SQL works on tabular structures, rows, and columns, both the input
and output of each clause are "tables".

But the order in which you type the query clauses is different from the order in which
they get logically interpreted. Each of the following clauses takes a virtual table as
input and produces another virtual table as output:
1. The FROM clause builds a virtual table based on the tables, views, and
joins defined
2. WHERE filters the result of the FROM
3. GROUP BY partitions the data into groups, aggregates all rows of a
group into a single row, and returns a grouped table
4. HAVING filters the grouped table
5. The <WINDOW clause> is part of the Window Function definition
(OVER), but the WINDOW syntax is not implemented in Teradata. It
also partitions the data into groups and applies
aggregates/rankings/etc., but (unlike GROUP BY) without removing
detail rows
6. QUALIFY filters the result of Window Functions
1. DISTINCT, NORMALIZE, and TOP are mutually exclusive
2. The projected columns in the SELECT list are created
7. SAMPLE return one or more samples of rows specified
EXPAND ON creates a regular time series of rows per input row
8. ORDER BY finally sorts rows for output

Because Teradata was implemented before there was Standard SQL Teradata’s parser
doesn’t enforce the correct order of query clauses, in fact, you can use any order as
long as a query starts with SELECT. Teradata will happily execute

SELECT … ORDER BY … QUALIFY … FROM … HAVING … GROUP BY


Please don’t do this, stay with the “official” order, any fellow user trying to understand
your queries or trying to port them to another DBMS will be grateful.

The DISTINCT Option

 You can apply a DISTINCT option in aggregate functions


 DISTINCT removes duplicate values before aggregation
o Typically this is done using COUNT only

A multi-column DISTINCT results in a


3706 Syntax error: expected something between the word
'department_number' and ','.

 Aggregate functions work on a single column only.


 Workaround: Combine columns into one.
Notes
Multiple COUNT DISTINCT Performance

Teradata Database 16.00 has significant improvement in the performance of queries


with multiple COUNT DISTINCT or MINIMUM / MAXIMUM DISTINCT operations.

This happens automatically:


<Refer to the EXPLAIN shown below>

Note that up to 12 COUNT DISTINCT operations can be processed in one aggregation


step. If there are more than 12 COUNT DISTINCT operations in a query block, the
database will split them into groups of 12.

You may see a fallback to pre-Teradata Database 16.00 processing with the following
use cases:
 "mixed aggregates" (for example, COUNT(DISTINCT) and
SUM(DISTINCT) in the same query). SUM(DISTINCT) will not use the
new processing
 complex distinct aggregates, in the presence of extended grouping
constructs (for example, GROUP BY ROLLUP, CUBE, or GROUPING
SETS)

You should collect statistics on any GROUP BY or DISTINCT expressions. This will
enable the optimizer to create the best execution plan. Without statistics, the
optimizer may conservatively use aggregate cache resources, and performance may
not be optimal.

EXPLAIN
Usage Notes
GROUP BY and DISTINCT

For cases when the GROUP BY is semantically equivalent to DISTINCT, the optimizer
makes a cost-based decision to eliminate duplicates either by way of a combined sort
and duplicate elimination or an aggregation step.

An example is shown below.

Aggregation Based on an Empty Set

Examples are shown below.

GROUP BY CLAUSE PRESENT


Returns no row.

GROUP BY CLAUSE NOT PRESENT


Always returns a row.
Datatype Returned by COUNT and Datatype Returned by AVG

Let's understand about Datatype returned by COUNT and Datatype returned by AVG.

DATATYPE RETURNED BY COUNT

DATATYPE RETURNED BY AVG

Notes
In an ANSI mode session, the data type for the result of COUNT is controlled by
the MaxDecimal in the DBS Control utility.

If MaxDecimal is:
 0 or 15, then the result type is DECIMAL(15,0)
 18, then the result type is DECIMAL(18,0)
 38, then the result type is DECIMAL(38,0)

In Teradata mode, the result is an INTEGER.


The DBA can override both defaults based DBSControl General
Field(80), COUNT_mode
 0 ⇒ Data type depends on session mode (see above)
 1 ⇒ Data type is NUMBER
 2 ⇒ Data type is BIGINT

The following query reveals the system's default data type:


Additional Aggregate Functions

Summary

 Aggregation summarizes detailed data into fewer rows


 You can use GROUP BY to summarize by groups
 WHERE conditions eliminate rows prior to performing the aggregation
 GROUP BY can reference by name, alias or numeric position
 HAVING eliminates groups after aggregations are performed
 HAVING must refers to an aggregated column or alias
 GROUP BY may be used to perform a DISTINCT operation

💻 Labs: Aggregation
Try completing the labs in the AWS lab environment. When you get the desired
results, compare your code with the code under the Solution tabs.
LAB 1

Select all currently active (legacy_flag = 0) jobs with


a job_code in the 33x.xxx range.
HELP
Add the number of employees per job_code and the sum TABLE hr_payroll;
of salaries (annual_salary) for each job. HELP TABLE hr_jobs;

Order the result set by job_code.

SOLUTION
Caution: COUNT(*) or COUNT(j.job_code) will return 1 instead of 0 for non-assigned job
codes.
LAB 2

Select the number of transactions


between '2018-01-05' and '2018-01-25'
(trans_date)
and the sum of transaction amounts (amount)
HELP TABLE fin_trans;
for accounts from district_id 1.
HELP TABLE fin_account;
Only return accounts with a sum less than -4000.

Order by sum(trans_amount).

SOLUTION
LAB 3

Modify the previous query to exclude accounts HELP TABLE fin_trans;


which had any deposits (i.e. amount > 0). HELP TABLE fin_account;

SOLUTION
Simply rephrase "No amounts over zero" to "Maximum amount is less than zero".
LAB 4

Report the transactions amounts of trans_type 'C' in


2017 (trans_date) in 10 equal Width_Buckets of 500 $
HELP
each.
TABLE fin_trans;
Add count, minimum, average, median, maximum and
sum aggregates.

SOLUTION

You might also like