03 Notes Advancedsql
03 Notes Advancedsql
03 Notes Advancedsql
Relational Languages
• User only needs to specify what they want (Declarative language i.e. SQL)
• Query optimizer figures out the best plan to get the answer
• Data definition language (DDL): How the database looks (i.e. schema)
History
• Edgar Codd published major paper on relational models
• IBM was the biggest party in Databases, so SQL became the standard
1
Lecture 03 15-445/645 Database Systems Advanced SQL
Aggregates
AVG, MIN, MAX, SUM, COUNT
• “Get # of students with a “@cs” login (all these queries are equivalent)
SELECT COUNT(*) FROM student WHERE login LIKE '%@cs'
2
Lecture 03 15-445/645 Database Systems Advanced SQL
• Having: filters output results after aggregation, Like a WHERE clause for a GROUP BY
SELECT AVG(s.gpa) AS avg_gpa, e.cid
FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
GROUP BY e.cid
HAVING avg_gpa > 3.9;
String Operations
• Strings are case sensitive and single quotes only with some exceptions
Output redirection
• For when you want to store query results into another table and run followup queries
SELECT DISTINCT cid INTO CourseIds FROM enrolled
3
Lecture 03 15-445/645 Database Systems Advanced SQL
Output control
• ORDER BY used to order tuples based on column
ORDER BY <column*> [ASC|DESC]
Nested Queries
• Often difficult to optimize
• Scope of outer query is included in inner query (i.e. inner query can access attributes from
outer query)
4
Lecture 03 15-445/645 Database Systems Advanced SQL
Window Functions
• Performs calculation across set of tuples
• Allows you to group calculation into windows
SELECT cid, sid,
ROW_NUMBER() OVER (PARTITION BY cid)
FROM enrolled
ORDER BY cid
• Placing ORDER BY within OVER() makes result deterministic ordering of results even if database
changes internally
SELECT *,
ROW_NUMBER() OVER (ORDER BY cid)
FROM enrolled
ORDER BY cid
• RANK is done after you order, ROW NUMBER before you order
Conclusion
• SQL is not a dead language
• Strive to compute answers in one SQL query