Snowflake
Snowflake
Categories:
Query syntax
JOIN
A JOIN operation combines rows from two tables (or other table-like sources, such as
views or table functions) to create a new combined row that can be used in the query.
For a conceptual explanation of joins, see Working with Joins.
This topic describes how to use the JOIN construct in the FROM clause. The JOIN
subclause specifies (explicitly or implicitly) how to relate rows in one table to the
corresponding rows in the other table. See also ASOF JOIN, which is used to join time-
series data on timestamp columns when their values closely follow each other,
precede each other, or match exactly.
Although the recommended way to join tables is to use JOIN with the ON subclause of
the FROM clause, an alternative way to join tables is to use the WHERE clause. For
details, see the documentation for the WHERE clause.
Syntax
Use one of the following:
SELECT ...
FROM <object_ref1> [
{
INNER
| { LEFT | RIGHT | FULL } [ OUTER ]
}
]
JOIN <object_ref2>
[ ON <condition> ]
[ ... ]
SELECT *
FROM <object_ref1> [
{
INNER
| { LEFT | RIGHT | FULL } [ OUTER ]
}
]
JOIN <object_ref2>
[ USING( <column_list> ) ]
[ ... ]
SELECT ...
FROM <object_ref1> [
{
| NATURAL [ { LEFT | RIGHT | FULL } [ OUTER ] ]
| CROSS
}
]
JOIN <object_ref2>
[ ... ]
Parameters
<object_ref1> and <object_ref2>
JOIN
Use the JOIN keyword to specify that the tables should be joined. Combine JOIN
with other join-related keywords (e.g. INNER or OUTER ) to specify the type of join.
The semantics of joins are as follows (for brevity, this topic uses <o1> and <o2>
for <object_ref1> and <object_ref2> , respectively).
<o1> For each row of <o1> , a row is produced for each row of <o2> that
INNER matches according to the ON <condition> subclause. (Note that you can
JOIN also use a comma to specify an inner join. For an example, see the
<o2> examples section below.) If you use INNER JOIN without the ON clause
(or if you use comma without a WHERE clause), the result is the same as
using CROSS JOIN : a Cartesian product (every row of <o1> paired with
every row of <o2> ).
<o1> The result of the inner join is augmented with a row for each row of <o1>
LEFT that has no matches in <o2> . The result columns referencing <o2>
OUTER contain null.
JOIN
<o2>
Join Type Semantics
<o1> The result of the inner join is augmented with a row for each row of <o2>
RIGHT that has no matches in <o1> . The result columns referencing <o1>
OUTER contain null.
JOIN
<o2>
<o1> Returns all joined rows, plus one row for each unmatched left side row
FULL (extended with nulls on the right), plus one row for each unmatched right
OUTER side row (extended with nulls on the left).
JOIN
<o2>
<o1> For every possible combination of rows from <o1> and <o2> (i.e.
CROSS Cartesian product), the joined table contains a row consisting of all columns
JOIN in <o1> followed by all columns in <o2> . A CROSS JOIN cannot be
<o2> combined with an ON <condition> clause. However, you can use a
WHERE clause to filter the results.
See also:
LATERAL
ASOF JOIN
Default: INNER JOIN
If the word JOIN is used without specifying INNER or OUTER , then the JOIN is
an inner join.
ON <condition>
A boolean expression that defines the rows from the two sides of the JOIN that
are considered to match, for example:
ON object_ref2.id_number = object_ref1.id_number
The ON clause is unnecessary (and prohibited) for NATURAL JOIN ; the join
columns are implied.
For other joins, the ON clause is optional. However, omitting the ON clause results
in a Cartesian product (every row of <object_ref1> paired with every row of
<object_ref2> ). A Cartesian product can produce a very large volume of output,
almost all of which consists of pairs of rows that aren’t actually related; this
consumes a lot of resources and is often a user error.
USING( <column_list> )
A list of columns in common between the two tables being joined; these columns
are used as the join columns. The columns must have the same name and
meaning in each of the tables being joined.
For example, suppose that the SQL statement contains:
... o1 JOIN o2
USING (key_column)
... o1 JOIN o2
ON o2.key_column = o1.key_column
In the standard JOIN syntax, the projection list (the list of columns and other
expressions after the SELECT keyword) is “*”. This causes the query to return the
key_column exactly once. The columns are returned in the following order:
You cannot specify the ON, USING, or NATURAL JOIN clause in an outer
lateral join to a table function (other than a SQL UDTF).
For example, the following syntax is not allowed:
These restrictions do not apply if you are using a comma, rather than a JOIN
keyword:
Examples
Many of the JOIN examples use two tables, t1 and t2 . The tables and their data are
created as shown below:
Inner join:
This shows a left outer join. Note the NULL value for the row in table t1 that doesn’t
have a matching row in table t2.
This shows a right outer join. Note the NULL value for the row in table t1 that doesn’t
have a matching row in table t2.
This shows a full outer join. Note that because each table has a row that doesn’t have
a matching row in the other table, the output contains two rows with NULL values:
Here is an example of a cross join, which produces a Cartesian product. Note that the
cross join does not have an ON clause.
A cross join can be filtered by a WHERE clause, as shown in the example below:
This is an example of a natural join. This produces the same output as the
corresponding inner join, except that the output doesn’t include a second copy of the
join column:
SELECT *
FROM d1 NATURAL FULL OUTER JOIN d2
ORDER BY ID;
+----+------+-------+
| ID | NAME | VALUE |
|----+------+-------|
| 1 | a | xx |
| 2 | b | yy |
| 4 | c | NULL |
| 5 | NULL | zz |
+----+------+-------+
Joins can be combined in the FROM clause. The following code creates a third table,
then chains together two JOINs in the FROM clause:
In such a query, the results are determined based on the joins taking place from left to
right (though the optimizer might reorder the joins if a different join order will produce
the same result). If the right outer join is meant to take place before the left outer join,
then the query can be written as follows:
The two examples below show standard (ISO 9075) and non-standard usage of the
USING clause. Both are supported by Snowflake.
This first example shows standard usage. Specifically, the projection list contains
exactly “*”. Even though the example query joins two tables, and each table has one
column, and the query asks for all columns, the output contains one column, not two.
WITH
l AS (
SELECT 'a' AS userid
),
r AS (
SELECT 'b' AS userid
)
SELECT *
FROM l LEFT JOIN r USING(userid)
;
+--------+
| USERID |
|--------|
| a |
+--------+
The following example shows non-standard usage. The projection list contains
something other than “*”. The output contains two columns, and the second column
contains either a value from the second table or NULL.
WITH
l AS (
SELECT 'a' AS userid
),
r AS (
SELECT 'b' AS userid
)
SELECT l.userid as UI_L,
r.userid as UI_R
FROM l LEFT JOIN r USING(userid)
;
+------+------+
| UI_L | UI_R |
|------+------|
| a | NULL |
+------+------+