DS 5110 – Lecture 3
SQL Part I
Roi Yehoshua
Agenda
The SQL Query Language
DDL commands
DML commands
Select commands
Set operations
Aggregate functions
Grouping
Join operations
2 Roi Yehoshua, 2024
SQL
Structured Query Language (SQL) is the standard relational databases language
SQL is a set-based, declarative programming language
It can be used to define the database structure, view or change the data
SQL became an ANSI and ISO standard in 1986, called SQL-86
Most commercial systems support SQL-92 features
One of the most demanded skills in Data Science
Source: KDnuggets
3 Roi Yehoshua, 2024
SQL Commands
SQL commands can be grouped into the following categories
SQL Commands
DDL DML DQL DCL TCL
(Data Definition (Data Manipulation (Data Query (Data Control (Transaction Control
Language) Language) Language) Language) Language)
CREATE INSERT SELECT GRANT COMMIT
ALTER UPDATE REVOKE ROLLBACK
DROP DELETE SAVEPOINT
RENAME MERGE SET TRANSACTION
TRUNCATE CALL
EXPLAIN PLAN
LOCK TABLE
4 Roi Yehoshua, 2024
SQL Extensions
Many SQL database programs have their own proprietary extensions that
add procedural programming language functionality, such as flow control constructs
Microsoft’s extension is called T-SQL (Transact SQL)
Oracle’s extension is called PL/SQL (Procedural Language/SQL)
MySQL extension is called SQL/PSM (SQL/Persistent Stored Module)
5 Roi Yehoshua, 2024
Basic SQL Syntax Rules
SQL is case insensitive
e.g., Select SELECT select
However, in MySQL the case makes difference in table names
All SQL statements end with a semicolon (;)
The semicolon is usually optional when executing a single SQL command
Strings are specified in single quotes only
In-line comments start with --
Multi-line comments start with /* and end with */
/* This is a multi-line
comment */
SELECT * FROM customers; -- This is an inline comment
See SQL style guide here
6 Roi Yehoshua, 2024
Data Definition Language (DDL)
SQL DDL commands allows you define the tables in the database including:
The schema of each table
The type of values associated with each attribute
Integrity constraints (e.g., primary and foreign keys)
Indexes
The physical storage structure of each table on disk
7 Roi Yehoshua, 2024
MySQL String Data Types
Data Type Description
CHAR(size) Fixed length string with a user-specified length (can be from 0 to 255)
VARCHAR(maxsize) Variable-length string with user-specified maximum length (can be from 0 to 65,535)
TINYTEXT A string with a maximum length of 255 characters
TEXT A string with a maximum length of 65,535 (~64KB) characters
MEDIUMTEXT A string with a maximum length of 16,777,215 (~16MB) characters
LONGTEXT A string with a maximum length of 4,294,967,295 (~4GB) characters
Differences between VARCHAR and TEXT:
You cannot specify the maximum length of TEXT fields
VARCHAR is stored inline with the table, while TEXT is usually stored off the table
In general, you should use VARCHAR for columns below 65,535 characters
8 Roi Yehoshua, 2024
MySQL Numeric Data Types
Data Type Description
TINYINT(size) An integer between -128 and 127
The maximum number of digits may be specified in parentheses.
BOOLEAN is synonym for TINYINT(1)
SMALLINT(size) An integer between -32768 and 32767
MEDIUMINT(size) An integer between -8388608 and 8388607
INT(size) An integer between -2147483648 and 2147483647
BIGINT(size) An integer between -9223372036854775808 and 9223372036854775807
FLOAT(size, d) A single-precision floating point number
The maximum number of digits may be specified in the size parameter. The maximum
number of digits to the right of the decimal point may be specified in the d parameter.
DOUBLE(size, d) A double-precision floating point number
REAL is synonym for DOUBLE
DECIMAL(size, d) A double stored as a string, allowing for a fixed decimal point
9 Roi Yehoshua, 2024
MySQL Other Data Types
Data Type Description
DATE A date in format: YYYY-MM-DD
DATETIME A date and time combination. Format: YYYY-MM-DD HH:MI:SS
TIMESTAMP A timestamp. Timestamp values are stored as the number of seconds since the Unix
epoch (1970-01-01 00:00:00 UTC). Format: YYYY-MM-DD HH:MI:SS
TIME A time. Format: HH:MI:SS
ENUM(x, y, z, …) Lets you enter a list of possible values. You can list up to 65,535 values in the enum list.
SET Similar to ENUM except that SET may contain up to 64 list items and can store more
than one choice.
BLOB Binary large object. Holds up to 65,535 bytes of data.
Can be used to store images.
LONGBLOB Binary large object. Holds up to 4,294,967,295 bytes of data.
10 Roi Yehoshua, 2024
Creating a Table
A table is defined using the CREATE TABLE command:
CREATE TABLE table_name (
column1 datatype [ constraints ],
column2 datatype [ constraints ],
...
);
Example:
CREATE TABLE employees (
employee_id INT,
first_name VARCHAR(20),
last_name VARCHAR(40),
email VARCHAR(50),
birth_date DATE,
salary DECIMAL(10, 2)
);
11 Roi Yehoshua, 2024
Creating a Table
To run SQL commands in Workbench, click the Create New SQL button in the toolbar
Type your command in the editor
Make sure you are located in the correct schema (it should be displayed in bold)
If it’s not the correct schema, double-click on the schema to change it
Or type USE [name of database] as the first line of your command
Then click Execute
12 Roi Yehoshua, 2024
Integrity Constraints
Ensure that changes to the database don’t result in loss of data or consistency
Declared as part of the CREATE TABLE command
Types of constraints
PRIMARY KEY
NOT NULL
UNIQUE
CHECK
FOREIGN KEY
13 Roi Yehoshua, 2024
Primary Key
The primary key uniquely identifies each record in the table
A table can have only one primary key
The primary key can consist of one or more columns
The columns of the primary key are automatically defined as unique and not null
The primary key can be defined as part of a column declaration or appear on its own
The second option is required if the primary key consists of multiple columns
CREATE TABLE employees ( CREATE TABLE employees (
employee_id INT PRIMARY KEY, employee_id INT,
first_name VARCHAR(20), first_name VARCHAR(20),
last_name VARCHAR(40), last_name VARCHAR(40),
email VARCHAR(50), email VARCHAR(50),
birth_date DATE, birth_date DATE,
salary DECIMAL(10, 2) salary DECIMAL(10, 2),
); PRIMARY KEY (employee_id)
);
14 Roi Yehoshua, 2024
Auto Increment
AUTO_INCREMENT can be used to generate a unique identity for new rows
It is often defined as an attribute of the primary key of the table
CREATE TABLE employees (
employee_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(20),
last_name VARCHAR(40),
email VARCHAR(50),
birth_date DATE,
salary DECIMAL(10, 2)
);
By default, the starting value of the field is 1, and increments by 1 for each new row
You can change the seed value by writing AUTO_INCREMENT=x at the end of CREATE TABLE
15 Roi Yehoshua, 2024
NOT NULL Constraint
By default, every column can hold NULL values
The NOT NULL constraint specifies that a column cannot have NULL values
This means you cannot insert a new row without specifying a value for this column
For example ensures that the first or last name of an employee are not NULL:
CREATE TABLE employees (
employee_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(20) NOT NULL,
last_name VARCHAR(40) NOT NULL,
email VARCHAR(50),
birth_date DATE,
salary DECIMAL(10, 2)
);
16 Roi Yehoshua, 2024
UNIQUE Constraint
Ensures that all the values in a column (or a combination of columns) are unique
A primary key constraint automatically implies a unique constraint
However, you may have many unique constraints per table
Columns declared as unique are permitted to be NULL
Unless they have explicitly been declared to be NOT NULL
For example, we can declare the email address of an employee to be unique
CREATE TABLE employees (
employee_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(20) NOT NULL,
last_name VARCHAR(40) NOT NULL,
email VARCHAR(50) UNIQUE,
birth_date DATE,
salary DECIMAL(10, 2)
);
17 Roi Yehoshua, 2024
CHECK Constraint
Used to limit the values that can be placed in a column or a set of column
May appear as part of the declaration of a column or on its own
For example, to ensure that the salary of an employee is not negative:
CREATE TABLE employees (
employee_id INT PRIMARY KEY AUTO_INCREMENT,
...
salary DECIMAL(10, 2) CHECK (salary >= 0)
);
A more complex check constraint that involves two columns:
CREATE TABLE orders (
order_id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT NOT NULL,
ship_address VARCHAR(60),
order_date DATETIME,
shipped_date DATETIME,
CHECK (shipped_date >= order_date)
);
18 Roi Yehoshua, 2024
Default Values
The DEFAULT constraint can be used to set a default value for a column
The default value will be added to all new records, if no other value is specified
The default value can use system functions like now() that returns current time
For example, we can set the default value of order_date to be now:
CREATE TABLE orders (
order_id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT NOT NULL,
ship_address VARCHAR(60),
order_date DATETIME DEFAULT NOW(),
shipped_date DATETIME,
CHECK (shipped_date >= order_date)
);
19 Roi Yehoshua, 2024
Foreign Keys
A foreign key is a column (or set of columns) in one table that refers to a primary key
in another table
The table with the foreign key is called the child table
The table with the primary key is called the referenced or parent table
A foreign key constraint is defined using the following syntax:
FOREIGN KEY (column1, column2, ...) REFERENCES table(column1, column2, ...)
The foreign key columns must match the number and data types of the columns in the parent table
Foreign keys enforce referential integrity of the data
A value in the foreign key column must be one of the values contained in the parent table
20 Roi Yehoshua, 2024
Foreign Keys
For example, let’s define a departments table:
CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(20),
building VARCHAR(15)
);
We can now define a foreign key from the employees to the departments table:
CREATE TABLE employees (
employee_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(20) NOT NULL,
last_name VARCHAR(40) NOT NULL,
email VARCHAR(50) UNIQUE,
birth_date DATE,
salary DECIMAL(10, 2) CHECK (salary >= 0),
department_id INT,
FOREIGN KEY (department_id) REFERENCES departments(department_id)
);
21 Roi Yehoshua, 2024
Cascading Actions in Foreign Keys
By default, when a referential integrity constraint is violated, the action is rejected
e.g., you cannot delete or update a row in the parent table if it has child rows linked to it
However, a foreign key constraint allows you to define CASCADE rules
ON DELETE CASCADE – when a row in the parent table is deleted, automatically delete its
corresponding child rows
ON UPDATE CASCADE – when the primary key column in the parent table is updated,
automatically update the foreign key column in the corresponding child rows
CREATE TABLE employees (
employee_id INT PRIMARY KEY AUTO_INCREMENT,
...
department_id INT,
FOREIGN KEY (department_id) REFERENCES departments(department_id)
ON DELETE CASCADE
ON UPDATE CASCADE
);
22 Roi Yehoshua, 2024
SQL Data Definition of the University Database
23 Roi Yehoshua, 2024
Modifying a Table Definition
ALTER TABLE allows you change the table definition
Add, delete or modify columns
Add or drop various constraints
To add a column to a table, use the following:
ALTER TABLE table_name
ADD column_name datatype;
To delete a column from a table, use the following:
ALTER TABLE table_name
DROP COLUMN column_name;
To change the data type of a column, use the following:
ALTER TABLE table_name
MODIFY COLUMN column_name datatype;
24 Roi Yehoshua, 2024
Dropping Tables
The DROP TABLE statement allows you to drop an existing table
DROP TABLE table_name;
Deletes both the data and the schema of the table
TRUNCATE TABLE deletes only the data in the table, but not the table itself
TRUNCATE TABLE table_name;
It’s faster than DELETE if you want to delete all the records from the table
25 Roi Yehoshua, 2024
Class Exercise
Create a new database called inventory
In this database define the following tables:
Categories table with the following columns:
category_id, category_name, description, picture
Products table with the following columns:
product_id, product_name, category_id, unit_price, units_in_stock
Make any reasonable assumptions about data types
Be sure to declare primary keys, foreign keys and other constraints
26 Roi Yehoshua, 2024
Solution
27 Roi Yehoshua, 2024
DML Commands
SQL DML commands allows you to manipulate the data in the database
Insertion of new rows into a given table
Deletion of rows in a given table
Updating values in a given table
28 Roi Yehoshua, 2024
INSERT INTO
The INSERT INTO statement is used to insert new records into a table
There are two ways to write the INSERT INTO statement
Specify both the column names and the values to be inserted to these columns
INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);
Provide values for all the columns in the table, without specifying the column names
INSERT INTO table_name
VALUES (value1, value2, value3, ...);
The order of the values must be in the same order as the columns in the table
You can use NULL to skip a value for a specific column
You can use DEFAULT for an auto-increment field or to use the column’s default value (if defined)
29 Roi Yehoshua, 2024
INSERT INTO Examples
Adding a new employee to the employees table:
INSERT INTO employees
VALUES (DEFAULT, 'King', 'Robert', 'king.b@gmail.com', '1980-10-25', 80000, NULL);
or equivalently
INSERT INTO employees (first_name, last_name, email, birth_date, salary)
VALUES ('King', 'Robert', 'king.b@gmail.com', '1980-10-25', 80000);
You can view the data in the table by right clicking on it and select Rows – Limit 1000
30 Roi Yehoshua, 2024
INSERT INTO
To add multiple rows to a table at once, you can use the following syntax:
INSERT INTO table_name (column1, column2, column3, ...)
VALUES
(value1, value2, value3, ...),
(value1, value2, value3, ...),
...
(value1, value2, value3, ...);
In this syntax, instead of using a single list of values, you use multiple comma-
separated lists of values for insertion
31 Roi Yehoshua, 2024
UPDATE
The UPDATE statement is used to modify existing records in a table
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
The WHERE clause specifies which record(s) should be updated
If you omit the WHERE clause, all the records in the table will be updated!
For example, let’s update the salary of employee no. 1 to 90,000:
UPDATE employees
SET salary = 90000
WHERE employee_id = 1;
And then give him 10% raise:
UPDATE employees
SET salary = salary * 1.1
WHERE employee_id = 1;
32 Roi Yehoshua, 2024
DELETE
The DELETE statement is used to delete records from a table
DELETE FROM table_name
WHERE condition;
The WHERE clause specifies which record(s) should be deleted
If you omit the WHERE clause, all the records in the table will be deleted!
Example: delete employee no. 1
DELETE FROM employees
WHERE employee_id = 1;
33 Roi Yehoshua, 2024
Safe Updates
The MySQL session has by default the safe-updates option set
This means you can’t update or delete records without specifying the primary key in
the WHERE clause
You can opt out this option by specifying
SET SQL_SAFE_UPDATES = 0;
Be sure to set this back with SET SQL_SAFE_UPDATES = 1 when you’re done
34 Roi Yehoshua, 2024
Class Exercise
Insert some records to the products and categories tables
Then delete and update a record
35 Roi Yehoshua, 2024
SELECT Statement
The SELECT statement is used to select data from a table (or a set of tables)
Basic syntax of SELECT is:
SELECT column1, column2, ...
FROM table_name;
To select all the columns in the table, use the following syntax:
SELECT *
FROM table_name;
In general, you should avoid using SELECT *
Produces unnecessary I/O and network traffic between the database server and the application
May expose sensitive information to unauthorized users
The data returned is stored in a result table called the result set
36 Roi Yehoshua, 2024
SELECT Statement
Example: print the name and dept_name of all the instructors
37 Roi Yehoshua, 2024
The SELECT Clause
In addition to columns from tables, you can include in the SELECT clause other things
Literals, such as numbers or strings
Arithmetic expression, such as unit_price * 10
Built-in function calls, such as ROUND(unit_price, 2)
User-defined function calls
38 Roi Yehoshua, 2024
SQL Aliases
SQL aliases are used to rename a table or a column in a specific SQL query
An alias is created using the AS keyword
The syntax for a column alias is:
SELECT column_name AS alias_name
FROM table_name;
The syntax for a table alias is:
SELECT column_name(s)
FROM table_name AS alias_name;
Aliases are useful when:
Functions are used in the query
Column names are too long or not very readable
Two or more columns are combined together
There is more than one table involved in a query
39 Roi Yehoshua, 2024
SQL Aliases
Example:
40 Roi Yehoshua, 2024
DISTINCT
In some cases, a query might return duplicate rows of data
For example, selecting the department names from the instructor table
41 Roi Yehoshua, 2024
DISTINCT
The DISTINCT clause can be used to return only the distinct values from the query
SELECT DISTINCT column1, column2, ...
FROM table_name;
If more than one column is specified after DISTINCT, SQL uses the combination of all the
columns to determine the uniqueness of the row in the result set
42 Roi Yehoshua, 2024
Built-in Functions
MySQL provides many built-in functions
e.g., string, numeric, date functions
Each database server has its own set of built-in functions
These functions can be used in different parts of the query
For a full list of built-in function see the documentation
https://dev.mysql.com/doc/refman/8.0/en/functions.html
43 Roi Yehoshua, 2024
MySQL String Functions
Function Description
CHAR_LENGTH Returns the length of a string (in characters)
CONCAT Concatenates two or more strings together
LOWER (LCASE) Converts a string to lower-case
UPPER (UCASE) Converts a string to upper-case
INSTR (LOCATE) Returns the position of the first occurrence of a string in another string
REPLACE Replaces a sequence of characters in a string with another set of characters
INSERT Inserts a substring within a string at the specified position
REPEAT Repeats a string value a specified number of times
REVERSE Reverses a string and returns the result
LEFT Extracts a substring from a string (starting from left)
RIGHT Extracts a substring from a string (starting from right)
SUBSTRING Extracts a substring from a string (starting at any position)
TRIM Removes leading and trailing spaces from a string
STRCMP Compares two strings
44 Roi Yehoshua, 2024
MySQL String Functions
Example:
45 Roi Yehoshua, 2024
MySQL Numeric Functions
Function Description
ABS Returns the absolute value of a number
AVG Returns the average value of an expression
CEILING Returns the smallest integer value that is greater than or equal to a number
COUNT Returns the count of an expression
COS Returns the cosine of a number
EXP Returns e raised to the power of number
FLOOR Returns the largest integer value that is equal to or less than a number
MOD Returns the remainder of a number divided by another number
LOG Returns the natural logarithm of a number or the logarithm of a number to a specified base
POWER Returns m raised to the nth power
RAND Returns a random number between 0 (inclusive) and 1 (exclusive)
ROUND Returns a number rounded to a certain number of decimal places
46 Roi Yehoshua, 2024
MySQL Numeric Functions
Example:
47 Roi Yehoshua, 2024
MySQL Date Functions
Function Description
ADDDATE Adds a time/date interval to a date and then returns the date
ADDTIME Adds a time interval to a time/datetime and then returns the time/datetime
CURRENT_DATE Returns the current date
CURRENT_TIME Returns the current time
NOW Returns the current date and time
DATEDIFF Returns the number of days between two dates
TIMEDIFF Returns the difference between two time/datetime expressions
EXTRACT Extract a part from a given date
DAY Returns the day of the month for a given date
WEEK Returns the week number for a given date
WEEKDAY Returns the weekday number for a given date
MONTH Returns the month part for a given date (a number from 1 to 12)
YEAR Returns the year part for a given date
48 Roi Yehoshua, 2024
MySQL Date Functions
Example:
49 Roi Yehoshua, 2024
MySQL Advanced Functions
Function Description
CAST (CONVERT) Converts a value (of any type) into a specified data type
COALESCE Returns the first non-null value in a list
IF Returns a value if a condition is TRUE, or another value if a condition is FALSE
IFNULL Returns a specified value if the expression is NULL, otherwise return the expression
ISNULL Returns 1 or 0 depending on whether an expression is NULL
DATABASE Returns the name of the current database
USER Returns the current MySQL user name and host name
VERSION Returns the current version of the MySQL database
50 Roi Yehoshua, 2024
MySQL Advanced Functions
Example:
51 Roi Yehoshua, 2024
CASE Expression
Evaluates a list of conditions and returns a value when the first condition is met
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
WHEN conditionN THEN resultN
ELSE result
END;
If there is no ELSE part and no conditions are true, it returns NULL
Can be used in different parts of the query (SELECT, WHERE, ORDER BY, etc.)
52 Roi Yehoshua, 2024
CASE Expression
Example: print the status of each student according to their grade
53 Roi Yehoshua, 2024
The WHERE Clause
The WHERE clause is used to filter records
It specifies conditions that the result set must satisfy
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Example: Find the names of all instructors in the Comp. Sci. department
54 Roi Yehoshua, 2024
The WHERE Clause
Operators that can be used in WHERE clause:
Operator Description
= Equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
<> Not equal
BETWEEN Between a certain range
IN Test if a value is within a list of values
LIKE Search for a pattern in a string value
55 Roi Yehoshua, 2024
The BETWEEN Operator
The BETWEEN operator selects values within a given range
The values can be numbers, text, or dates
The BETWEEN operator is inclusive: begin and end values are included
Example: The name and salary of instructors with a salary between 80K and 100K
56 Roi Yehoshua, 2024
The IN Operator
The IN operator allows you to specify multiple values in the WHERE clause
Example: Print the title and dept_name of all the courses in the Music, Physics and
Biology departments
57 Roi Yehoshua, 2024
Logical Operators
The WHERE clause can use the logical operators AND, OR and NOT
Example: Print the name and salary of all instructors in the Comp. Sci. department
with a salary greater than $70,000
58 Roi Yehoshua, 2024
The LIKE Operator
The LIKE operator allows you to perform pattern matching on strings
MySQL provides two wildcard characters for constructing patterns:
percentage (%) matches any string of 0 or more characters
underscore (_) matches any single character
Examples:
's%' matches any string beginning with the character 's'
'%Comp%' matches any string containing "Comp" as a substring
'_ _ _' matches any string of exactly three characters
'_ _ _ %' matches any string of at least three characters
To include the special pattern characters, we escape them using a backslash \
e.g., 'ab\%cd%' matches any string beginning with 'ab%cd’
In MySQL the patterns are case insensitive
59 Roi Yehoshua, 2024
The LIKE Operator
Find the names of all the students whose name start with the letter ‘B’ or ‘S’
60 Roi Yehoshua, 2024
NULL Values
A field with a NULL value is a field that has no value
The value may be unknown or doesn’t exist
The result of any arithmetic expression involving NULL is NULL
e.g., 5 + NULL returns NULL
You cannot test for NULL values with comparison operators, such as =, <, or <>
SQL returns an unknown result for each comparison with NULL
You have to use the IS NULL and IS NOT NULL operators instead
61 Roi Yehoshua, 2024
NULL Values
Example: In the sakila database, select all the rentals that were not returned (have
no return_date)
62 Roi Yehoshua, 2024
The ORDER BY Clause
The ORDER BY clause is used to sort the result set
By default it sorts the records in ascending order
To sort the records in descending order, use the DESC keyword
The sort can be performed on multiple columns
SELECT column1, column2, ...
FROM table_name
ORDER BY column1 ASC|DESC, column2 ASC|DESC, ...;
63 Roi Yehoshua, 2024
The ORDER BY Clause
Example: sort the instructors first by their department name and then by their name
64 Roi Yehoshua, 2024
The ORDER BY Clause
You can also use an expression as the sort column
Example: In sakila, sort the rentals table by the month of the return date
65 Roi Yehoshua, 2024
LIMIT
LIMIT n specifies that only the first n rows from the result set should be output
This clause is typically used together with ORDER BY to fetch the top n records
Example: Print the name and salary of the top three earning instructors
66 Roi Yehoshua, 2024
Class Exercise
Show the IDs and names of the three students in Comp. Sci. with the highest number
of credit points
67 Roi Yehoshua, 2024
Solution
68 Roi Yehoshua, 2024
Set Operators
You can combine the result sets of two or more queries using set operators
The three set operations are UNION, INTERSECT and EXCEPT
MySQL support only the UNION operator
When performing set a set operation, the following guidelines must apply:
Every select must have the same number of columns
The corresponding columns must have similar data types
The UNION operator syntax:
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
69 Roi Yehoshua, 2024
Set Operators
Example: Find all the courses taught either in Fall 2017 or Spring 2017
70 Roi Yehoshua, 2024
The UNION ALL Operator
The UNION operator selects only distinct values by default
To allow duplicate values, use UNION ALL:
SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;
71 Roi Yehoshua, 2024
Aggregate Functions
Aggregate functions take a set of values of a column and return a single value
SQL offers 5 built-in aggregate functions:
Function Description
MAX Returns the maximum value within a set
MIN Returns the minimum value within a set
AVG Returns the average value across a set
SUM Returns the sum of the values across a set
COUNT Returns the number of values in a set
72 Roi Yehoshua, 2024
Aggregation with Null Values
All aggregate functions except COUNT(*) ignore NULL values in their input
COUNT(*) returns the number of rows in the table, regardless of NULL values
COUNT(column) returns the number of non-null values in the specified column
73 Roi Yehoshua, 2024
Counting Distinct Values
Sometimes we want to eliminate duplicates before computing an aggregate function
In this case, we can use DISTINCT inside the call to the function
Example: Find the total number of instructors who taught a course in Spring 2017
Each instructor should only count once, regardless of the number of sections they teach
74 Roi Yehoshua, 2024
GROUP BY
The GROUP BY clause groups rows the have the same values in specified column(s)
Often used with an aggregate function that is applied over all the rows in each group
The syntax of GROUP BY is:
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s);
75 Roi Yehoshua, 2024
GROUP BY
Example: Find the average salary of instructors in each department
76 Roi Yehoshua, 2024
GROUP BY
Find the number of different courses (not sections) taught by each instructor
77 Roi Yehoshua, 2024
The Single Value Rule
All non-aggregated attributes in the SELECT should also appear in the GROUP BY
Most databases (except for MySQL) throw an error if this rule is not followed
For example, consider the following query:
SELECT id, dept_name, AVG(salary)
FROM instructor
GROUP BY dept_name;
Each instructor in a particular group (defined by dept_name) can have a different ID
Since only one row is output for each group, there is no unique way of choosing which ID
value to output
78 Roi Yehoshua, 2024
Grouping by Multiple Columns
In some cases, you may want to generate groups based on more than one column
For each section show the course id, section id, year, semester and students number
79 Roi Yehoshua, 2024
Grouping by Expressions
You can also build groups based on values computed by expressions
Example: For each English letter show how many student names start with that letter
80 Roi Yehoshua, 2024
The HAVING Clause
HAVING allows you to specify conditions that filter the groups created by GROUP BY
In contrast to WHERE that defines conditions on the selected rows
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
81 Roi Yehoshua, 2024
The HAVING Clause
Example: print the names of the departments where the average salary > 75K
82 Roi Yehoshua, 2024
The HAVING Clause
All non-aggregated columns in HAVING must appear in the GROUP BY clause
For example, the following SQL statement is invalid:
83 Roi Yehoshua, 2024
The HAVING Clause
The filter on the name instructor should be written in a WHERE clause instead:
84 Roi Yehoshua, 2024
Class Exercise
Show the courses that have been taken by at least 3 students
Show the course ID and number of students
85 Roi Yehoshua, 2024
Solution
86 Roi Yehoshua, 2024
SQL Joins
A join operation is used to combine rows from two or more tables
e.g., find the names of all instructors that teach the course ‘Robotics’
There are different types of joins:
Cross join
Inner join
Natural join
Self join
Outer join
87 Roi Yehoshua, 2024
Cross Join
A cross join returns the Cartesian product of rows from the tables in the join
i.e., it combines each row from the first table with each row from the second table
Two ways to define a cross join:
Use the JOIN keyword between the table names
SELECT column_name(s)
FROM table1
JOIN table2;
List the tables with a comma between them (older syntax)
SELECT column_name(s)
FROM table1, table2;
88 Roi Yehoshua, 2024
Cross Join
Example: cross join the instructor and the teaches tables
89 Roi Yehoshua, 2024
Inner Join
An inner join selects only rows that have matching values in both tables
Two ways to define an inner join:
Using the JOIN keyword with ON clause
SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;
The keyword INNER is optional, since the default join type is inner
List the tables with a comma between and use WHERE to define the join condition
SELECT column_name(s)
FROM table1, table2
WHERE table1.column_name = table2.column_name;
The first syntax should be preferred, because it encourages separation of concerns
90 Roi Yehoshua, 2024
Inner Join
Example: show all the instructors and the courses that they have taught
91 Roi Yehoshua, 2024
Renaming Tables
Aliases are often used in JOIN statements to rename the tables
Conventions:
The alias name should be the first letter of each word in the table’s name
If there is already an alias with the same name then append a number (e.g., t1, t2)
Always include the AS keyword – makes it easier to read the query
92 Roi Yehoshua, 2024
Joining More than Two Tables
Example: Print the names of the instructors and names of all the students they have
taught (make sure there are no duplicates in the result)
93 Roi Yehoshua, 2024
Class Exercise
Find the names of all the students that took the course 'Robotics'
94 Roi Yehoshua, 2024
Solution
95 Roi Yehoshua, 2024
Natural Join
A natural join joins two tables based on the common columns in the two tables
Each common column will appear only once in the result
SELECT column_name(s)
FROM table1
NATURAL JOIN table2;
96 Roi Yehoshua, 2024
Natural Join
Example: natural join between instructor and teaches
Beware of unrelated attributes with same name that may get equated incorrectly
97 Roi Yehoshua, 2024
Self-Join
A self-join is a join of a table with itself
It allows you to compare rows within the same table
You must use the rename operator in a self-join
SELECT column_name(s)
FROM table1 AS t1
JOIN table1 AS t2
ON t1.column_name = t2.column_name;
98 Roi Yehoshua, 2024
Self-Join
Example: Find all the instructors whose salary is higher than Katz’s salary
99 Roi Yehoshua, 2024
Outer Join
Outer join allows you to include unmatched rows from one or both of the tables
The unmatched rows will have NULL values for the attributes coming from the other table
There are three forms of outer join
Left outer join includes unmatched rows from the left table
Right outer join includes unmatched rows from the right table
Full outer join includes unmatched rows from both tables
The syntax for outer join:
SELECT column_name(s)
FROM table1
[LEFT/RIGHT/FULL] OUTER JOIN table2
ON table1.column_name = table2.column_name;
100 Roi Yehoshua, 2024
Left Outer Join
For example, let’s say we want to show all instructors and the courses they taught
An inner join won’t show instructors that have not taught any course
If we also want to get these instructors in the result, we can use a left outer join:
101 Roi Yehoshua, 2024
Right Outer Join
We can write a similar query using a right outer join with the tables order reversed:
102 Roi Yehoshua, 2024
Full Outer Join
The full outer join is a combination of the left and right outer join types
Example: display a list of all instructors and their departments
All departments must be displayed, even if they have no instructor assigned to them
All instructors must be displayed, even if they don’t belong to any department
SELECT *
FROM instructor AS i
FULL OUTER JOIN department AS d
ON i.dept_name = d.dept_name;
103 Roi Yehoshua, 2024
Full Outer Join
Some databases like MySQL don’t support full outer join
In such databases, we can emulate full outer join using UNION:
104 Roi Yehoshua, 2024
Class Exercise
Find the ID and names of instructors who have not taught any course
105 Roi Yehoshua, 2024
Solution
106 Roi Yehoshua, 2024
Summary: Join Types
107 Roi Yehoshua, 2024