Teradata Vantage SQL Basics 2
Teradata Vantage SQL Basics 2
Notes
One of the most important differences between joins and subqueries is the need for
establishing a one-to-many relationship in a join, something that is automatically provided
when writing a subquery.
This example illustrates what happens when the join relationship is many-to-many. In such
a case, unintended result rows appear in the final result set. In the example, employee 1015
works only in department 501, but since the join is on the manager number, this result set
will show that person working in both department 501 and department 402. Certainly not
the real circumstance. Other employees (employees 1018, 1023, 1006, 1008) share the same
fate. It would be difficult, if not impossible, to view the result set and know who truly works
in which department.
Cross Join
A CROSS JOIN is a join where no join condition is specified. Since no qualification exists, the
database establishes a pseudo-condition of “WHERE 1=1”, which is true for each and every
comparison. An example is illustrated below demonstrating Cross Join.
Notes
CROSS JOIN is a rarely used syntax. In our example, the one on the left is preferred
because it shows the reader that a cross join is intended, while the one on the right may or
may not be (perhaps the writer forgot the join condition).
Since no join condition exists, the database invents one for us, whether we are pleased
with it or not. Explain shows a condition of “1=1”, which always evaluates true. Thus, you
can read the row for employee “Short” as “Project the employee’s number, last name, and
department name for each row in the department table where 1=1 is true.” The result is to
project these column values (from the “Smith” row) for each department row. The same
thing happens all over again for each employee row.
issued without a WHERE clause. In this case, each row of one table is joined to each row of another
table. The output of a Cartesian product is often not meaningful however they do have useful
application. Any query requiring all elements of one table to be compared to all elements of another is
All others will cause a three-table join: An Inner Join between dept and emp followed by a Cross Join
to employee!
Notes
–
A table-alias is not really an alias, it replaces the tablename within that query. When using
aliases in writing joins, one must be careful to always use alias names when referencing,
and not use aliased table names. In our examples, the table Employee has been aliased as
“emp” in the FROM, but the join condition references Employee as a table name and does
not reference the alias. In the first example SQL-92 join syntax requires a join condition
from the joined tables and thus the query fails with "3782: Improper column reference in
the search condition of a joined table". But the SQL-89 syntax is interpreted as having three
a Teradata SQL request (but in Standard SQL). Teradata was implemented before there was
Standard SQL, the initial query language was called TEQUEL (TEradata QUEry Language),
RETRIEVE employee.last_name contains enough information for the Parser to resolve table
last_name
----------------------------------------
Hopkins
Ratzlaff
Rogers
Rogers
Kanieski
Crane
Stein
Johnson
Short
Brown
...
joins and then attempts to determine the most cost effective join order. Because the Optimizer uses
column statistics to choose the least costly join plan from the candidate set it generates, the plans it
Column projection and row selection are done prior to doing the join.
n-way joins are reduced to a series of binary joins.
Query optimizers use trees to build and analyze optimal join orders, most common are:
Notes
–
Query optimizers use trees to build and analyze optimal join orders. The join
search tree types used most frequently by relational database optimizers are the
When a left-deep search tree is used to analyze possible join orders, the number
Bushy trees are an optimal method for generating more join order combinations.
than the left-deep tree method. Bushy trees also provide the capability of
The Optimizer uses various combinations of join plan search trees, sometimes
mixing left-deep, bushy, and even right-deep branches within the same tree.
The possibilities for ordering binary joins escalate rapidly as the total number of
relations joined increases. The Optimizer is very intelligent about looking for
plans and uses numerous field-proven heuristics to ensure that more costly
plans are eliminated from consideration early in the costing process in order to
Summary
● Column values may be projected from any table of a join
● Subqueries and inner joins can both return inner result sets
● Inner joins cannot return outer (NOT IN) result sets as can subqueries
● Incorrect table and column references can cause incorrect result sets
LAB 1
Select all currently active (legacy_flag = 0) jobs with a job_code in the
33x.xxx range with assigned (= exist in hr_payroll) active HELP TABLE br_payroll;
SOLUTION
br_jobs.job_code
is unique, but one row per matching employee is returned, resulting in duplicate rows DI
SOLUTION
No DISTINCT needed as br_payroll.job_code is non-unique.
LAB 3
LAB 4
Select all districts with less than
60,000 inhabitants
(num_inhabitants) having
HELP TABLE fin_district;
accounts with a loan at status
HELP TABLE fin_account;
'D' (= running contract, client in
HELP TABLE fin_loan;
debt).
SOLUTION
Again, DISTINCT needed to remove duplicate rows.
Lesson 12 of 22
Outer Joins
At the end of this lesson, you will be able to:
The outer join is an extension of the inner join. The difference between outer joins and
inner joins is the existence of a nonmatching row set in the outer join result.
Outer joins of two or more tables perform an inner join of those tables according to a
specified join condition and also return rows from the left join table, the right join table, or
both, that do not match the inner join condition, extending the results rows with nulls in the
nonmatching fields.
The non-matching rows are for those rows in the outer table that do not have a match in the
inner table, while the matching rows in the outer join result are the rows that satisfy the join
condition, which is exactly the same as those in the inner join result.
Inner Table and Outer Table
Inner Table
The inner table is the table that contributes only matched rows to the outer join result. For
an inner join, both tables are inner tables.
Making It Real
The outer table is the table that contributes unmatched rows to the outer join result. In this
description, the word "unmatched" refers to the rows in the left or right (or both) table that
are not part of the inner join rows because there are no matching columns, so the rows are
extended with NULLs in the results table.
Example
Below are the two syntax examples, showing Explicit Join syntax only and the way of using
the "Outer" keyword for Outer Join respectively.
EXPLICIT JOIN SYNTAX ONLY
There's no implicit syntax for outer joins.
"OUTER" KEYWORD
Just as the INNER keyword for Inner Joins, the OUTER keyword is optional.
Any additional restriction in the ON clause is applied as a residual condition to the join. Any
restriction in the WHERE clause is applied to the table that is the final result of the outer join.
Left Outer Join
Follow the example below to learn about the Left Outer Join. The table to the LEFT of the JOIN
keyword is the OUTER table.
Using the query shown here, you can observe how Left Outer Join works, in the animation as
below.
Notes
There are several situations that can add complexity to understanding outer join results. Here we
see how introducing a Null value into the inner table can potentially provide a certain level of
confusion.
In the result set a Null is returned for an inner table column for an inner result and does not
indicate an outer result! If the department name column was defined as NOT NULL, then a Null
value for the department name could not exist, and so a Null for this column in the result set can
now be interpreted as an outer result row. Always check the optimizer plan if it actually shows an
Outer Join:
Using the query shown here, you can observe how Right Outer Join works, in the animation as
below.
Notes
Be aware that even with an explicit RIGHT join Explain might show a LEFT join simply switching
the tables:
Full Outer Join
Follow the example below to learn about the Full Outer Join. Tables to the LEFT and the RIGHT of
the JOIN keyword are both OUTER tables.
Using the query shown here, you can observe how Full Outer Join works, in the animation as
below.
Notes
Join Types and Join Order
Outer Joins might return different results based on the join types and the order of joins:
The logical join order is determined by the sequence of ONs
Each JOIN creates an intermediate result which is subsequently JOINed to the next table
Two tables A and B can be joined in 4 variations:
Each additional join quadruples the number of possible variations, but not all of them
actually provide logically different results.
The optimizer is quite smart simplifying an Outer Join. If it doesn't change the result set:
Full → Left/Right
Left/Right → Inner
Always check if Explain shows the expected join type.
Notes
Temporary Derived Tables
When joining three or more tables, the join process logically creates a temporary derived table that
is defined only for the duration of the SELECT operation. Depending on various factors, the
Optimizer might rewrite a request in such a way that it generates a query plan that is very different
from the 1:1 correspondence between the coded query and its actual implementation that this
simplified description implies. Such a temporary table is also called a joined table. Vantage refers
to this kind of temporary table as a spool.
ON Clause Evaluation Order
The result of an inner join of two tables is not changed if rows from table a are joined to rows from
table b, or if rows from table b are joined to rows from table a. In terms of set algebra, inner joins
are both commutative and associative. This is true no matter how many tables are (inner) joined.
Because inner joins are both commutative and associative, the Optimizer can select the best join
order for inner joins arbitrarily and the end result is always identical. Outer joins, on the other
hand, are rarely either commutative or associative. The Optimizer cannot select an arbitrary best
join order for outer joins because neither commutativity nor associativity can be assumed, nor
does it have any way to know what specific result you intended to produce with the query you
presented to it.
To outer join three tables, you must specify their join order explicitly by placing the ON clause in
an appropriate position within the FROM clause to ensure that the join is evaluated correctly.
The Optimizer follows these rules to generate outer joins.
The first ON clause in the query (reading from left to right) is evaluated first
Any ON clause applies to its immediately preceding join operation
Adding a Third Table
Below is the representation of adding a third table.
Notes
Guidelines for Using Outer Joins
Outer joins, when used properly, provide additional information from a single query
that would otherwise require multiple queries and steps to achieve. However, the
proper use of outer joins requires training and experience because “common sense”
reasoning does not always apply to formulating an outer join query in a manner that is
not only syntactically correct but also returns an answer that is correct for the
business question.
To ensure the correct answers to your business questions using outer joins, refer to
these steps.
1. Understand the question you are trying to answer and know the
demographics of your tables.
Form a hypothesis of the answer set you to expect before you begin
2. Write the query, keeping in mind the proper placement of join and
search conditions
Condition Clause
Join ON
Search condition predicates for inner ON
table
Search condition predicates for outer WHERE
table
3. Always EXPLAIN the query before performing it Look for the words
outer join in the EXPLAIN text. If the EXPLAIN text does not include
outer join, the Optimizer did not produce an outer join plan
4. Perform the query and compare the result with your expectations
An example of coding search conditions for outer joins is shown below. In this
example:
Any restriction in the WHERE clause is applied to the table that is the final result
of the outer join
Any additional restriction in the ON clause is applied as a residual condition to
the join
Recommendations:
Notes
Coding ON Clauses With WHERE Clauses for Outer Joins
To limit the number of qualifying rows in the outer table (and therefore the answer
set), the search condition for the outer table must be in the WHERE clause. Note that
the Optimizer logically applies the WHERE clause condition only after a join has been
produced. The actual application of conditions always depends on how the Optimizer
chooses to implement the query.
A search condition in the ON clause of the inner table does not limit the number of
rows in the answer set. It defines the rows that are eligible to take part in the match
to the outer table.
If a search condition on the inner table is placed in the WHERE clause, the join is
logically equivalent to an inner join, even if you explicitly specify the keywords
LEFT/RIGHT/FULL OUTER JOIN in the query. The Optimizer always treats such a join as
an inner join to simplify the query, rewriting it to roll the entire complex process into a
single step.
Summary
SOLUTION
No search condition on the employee table ⇒ no join needed ⇒ a select on the job
table does the job, too. Optimizer removes the Outer Join automatically ⇒ Join
Elimination.
Caution: It will not remove the join if p.job_code is used in any place or the DISTINCT
is missing.
Notes - Lab 1
Eliminating Redundant Joins
Joins are among the most frequently specified operations in SQL queries. They are
also among the most demanding resource consumers in the palette of an SQL coder.
Join elimination removes unnecessary tables from queries.
Join elimination is most commonly used in the following situations:
Inner joins based on any form of referential integrity between 2 tables
Left and right outer joins can be eliminated if the join is based on
unique columns from the right table in the join specification
In both cases, a join and a table (the table eliminated is the parent table in inner joins
and the inner table in outer joins) are removed from the query, assuming that no
projections are needed from the eliminated table to keep the query whole.
LAB 2
SOLUTION
Notes - Lab 2
LAB 3
Select all active (i.e. hire_end_date is NULL) employees HELP
with an annual_salary over 120,000. TABLE br_payroll;
HELP
If the budget_amount of the employee's department is TABLE br_departmen
less than 20,000,000 include the budget and calculate the ts;
percentage each salary represents of the departmental
budget, otherwise return a NULL.
Order by last_name.
Solution:
In decimal calculations the result is rounded after
each step. ⇒ Order of calculation is important ⇒
Multiply first, then divide.
Notes - Lab 3
Lesson 13 of 22
EXISTS Subqueries
🎯 Objectives
At the end of this lesson, you will be able to:
EXISTS Predicate
Notes
EXISTS/NOT EXISTS
Tests a specified table (normally a derived table) for the existence of at least one row
(that is, it tests whether the table in question is non-empty).
If execution of the subquery returns response rows then the where condition is
considered satisfied. Using the NOT qualifier for the EXISTS predicate reverses the
sense of the test. Execution of the subquery does not, in fact, return any response
rows. Instead, it returns a boolean result to indicate whether responses would or
would not have been returned had they been requested.
Notes
The result of SQL Exists is a Boolean value TRUE or FALSE, if the subquery returns one
or more records it returns TRUE otherwise it returns FALSE. But EXISTS never returns
UNKNOWN.
Use the NOT EXISTS predicate instead of NOT IN if the following conditions are true:
Some column of the NOT IN condition is defined as nullable
Any rows from the main query with a null in any column of the NOT IN
condition should always be returned
Any nulls returned in the select list of the subquery should not prevent
any rows from the main query from being returned
In relational databases, to say that a row wasn’t projected because the condition
evaluated FALSE is incorrect. We say that it didn’t project because it didn’t evaluate
TRUE. This is a significant statement! It will cause many a quandary with respect to
writing bug-free SQL.
Explaining NOT IN
Both exclusion join and inclusion join can't be expressed using SQL join
syntax.
Notes
Definition of the Inclusion Join
An inclusion join is a join where the first right table row that matches the left row is
joined.
There are two types of performances: EXISTS performance and NOT EXISTS
performance.
Similar plan to NOT IN subquery only if columns are defined as NOT NULL
o Adding a WHERE ... IS NOT NULL ensures the correct result, but
the plan might still be complex
Outperforms NOT IN if a comparison is based on NULLable columns
Usually at least equal to a LEFT JOIN plus WHERE IS NULL, but might be
way better (depending on the actual data)
Non-Correlated EXISTS
But EXISTS looks for rows instead of columns, ignoring all expressions in the
select list
Even invalid expressions don't fail
Parser needs to resolve metadata for all columns (before it's discarded again by
the optimizer)
o Minimal overhead
o Might fail if column-level access rights exist
EXISTS in a view is resolved to the full list of columns and will fail if a column is
dropped from the base table
Notes
According to Standard SQL the select list of an EXISTS subquery is equivalent to an
arbitrary literal value.
You can always see a .SQL-file how SELECT * is expanded to a column list.
Note: This is related to the .SQL file with all the code shown in this module in it.
These files will be distributed as part of the training.
Summary
EXISTS and NOT EXISTS may be referenced for checking if certain conditions
exist or not
EXISTS might be correlated or non-correlated subqueries
NOT EXISTS may outperform NOT IN subqueries
LAB 2
LAB 3
SOLUTION
The Aggregate Functions
Operate on a group of rows and return a single numeric value for each group
Do not provide row-level detail
The data on the right shows the rows in the base table before aggregation.
Aggregates ignore NULLs in all computations.
ANSI mode session returns Warning 2892: Null value eliminated in set function.
Caution: If NULLs exist the result of SUM(A) + SUM(B) ≠ SUM(A+B).
Notes
Aggregate functions perform operations that summarize information found in tables,
i.e., one can expect that a certain amount of detailed information will be lost when
performing aggregations. In our first example, we see that a single row is returned,
representing the minimum, maximum, sum, and average of all salary amounts for all
employees and a number of rows with a known salary amount. COUNT(*) provides us
with a mechanism that can be used to count actual rows. It doesn’t matter if NULLs
appear for each and every column value in a row, it still gets counted!
Aggregates (with the exception of COUNT(*)) ignore nulls in all computations. Ignoring
nulls can result in apparent nontransitive anomalies. For example, if there are nulls in
either column A or column B (or both), then the following expression is virtually
always true:
The only exception to this is the case in which the values for columns A and B are
both null in the same rows because in those cases the entire row is disregarded in the
aggregation. This is a trivial case that does not violate the general rule.
More formally stated, if and only if field A and field B are both null for every
occurrence of a null in either field is the above inequality false.
There are several ways to work around this apparent non-transitivity issue if it
presents a problem. Either solution provides the same consistent results.
Always define your numeric columns as NOT NULL
Use a COALESCE expression on every NULLable column within the
aggregate function to convert any nulls to zeros for the computation,
for example
Notes
The GROUP BY clause permits the definition of groups within a SELECT statement.
Aggregate functions may then be performed on each group, with a resulting line of
output to represent each group. Detail rows of information are unavailable when using
an aggregate function, either with or without a GROUP BY clause.
If you use an aggregate function in the select list of an SQL statement, then either all
other columns occurring in the select list must also be referenced by means of
aggregate functions or their column name must appear in a GROUP BY clause. The
reason for this is that aggregates return only one value, while a non-GROUP BY
column reference can return any number of values.
The HAVING Clause
You can use the HAVING clause for qualifying on an aggregated value.
HAVING is just like WHERE, except that it applies to groups rather than rows
Let's see the difference between the WHERE clause and HAVING clause.
WHERE Clause
The WHERE clause must reference a non-aggregated
value.
1. Typed Order
2. Logical Order
Each select clause defines what is available to the clauses in subsequent steps and
transforms the data step-by-step from the input tables into the final result set. And
because Standard SQL works on tabular structures, rows, and columns, both the input
and output of each clause are "tables".
But the order in which you type the query clauses is different from the order in which
they get logically interpreted. Each of the following clauses takes a virtual table as
input and produces another virtual table as output:
1. The FROM clause builds a virtual table based on the tables, views, and
joins defined
2. WHERE filters the result of the FROM
3. GROUP BY partitions the data into groups, aggregates all rows of a
group into a single row, and returns a grouped table
4. HAVING filters the grouped table
5. The <WINDOW clause> is part of the Window Function definition
(OVER), but the WINDOW syntax is not implemented in Teradata. It
also partitions the data into groups and applies
aggregates/rankings/etc., but (unlike GROUP BY) without removing
detail rows
6. QUALIFY filters the result of Window Functions
1. DISTINCT, NORMALIZE, and TOP are mutually exclusive
2. The projected columns in the SELECT list are created
7. SAMPLE return one or more samples of rows specified
EXPAND ON creates a regular time series of rows per input row
8. ORDER BY finally sorts rows for output
Because Teradata was implemented before there was Standard SQL Teradata’s parser
doesn’t enforce the correct order of query clauses, in fact, you can use any order as
long as a query starts with SELECT. Teradata will happily execute
You may see a fallback to pre-Teradata Database 16.00 processing with the following
use cases:
"mixed aggregates" (for example, COUNT(DISTINCT) and
SUM(DISTINCT) in the same query). SUM(DISTINCT) will not use the
new processing
complex distinct aggregates, in the presence of extended grouping
constructs (for example, GROUP BY ROLLUP, CUBE, or GROUPING
SETS)
You should collect statistics on any GROUP BY or DISTINCT expressions. This will
enable the optimizer to create the best execution plan. Without statistics, the
optimizer may conservatively use aggregate cache resources, and performance may
not be optimal.
EXPLAIN
Usage Notes
GROUP BY and DISTINCT
For cases when the GROUP BY is semantically equivalent to DISTINCT, the optimizer
makes a cost-based decision to eliminate duplicates either by way of a combined sort
and duplicate elimination or an aggregation step.
Let's understand about Datatype returned by COUNT and Datatype returned by AVG.
Notes
In an ANSI mode session, the data type for the result of COUNT is controlled by
the MaxDecimal in the DBS Control utility.
If MaxDecimal is:
0 or 15, then the result type is DECIMAL(15,0)
18, then the result type is DECIMAL(18,0)
38, then the result type is DECIMAL(38,0)
Summary
💻 Labs: Aggregation
Try completing the labs in the AWS lab environment. When you get the desired
results, compare your code with the code under the Solution tabs.
LAB 1
SOLUTION
Caution: COUNT(*) or COUNT(j.job_code) will return 1 instead of 0 for non-assigned job
codes.
LAB 2
Order by sum(trans_amount).
SOLUTION
LAB 3
SOLUTION
Simply rephrase "No amounts over zero" to "Maximum amount is less than zero".
LAB 4
SOLUTION