New Subquery Optimizations in MySQL 6
New Subquery Optimizations in MySQL 6
New Subquery Optimizations in MySQL 6
optimizations
in MySQL 6.0
Presented by,
MySQL AB® & O’Reilly Media, Inc.
Sergey Petrunia
sergefp@mysql.com
Background: subquery processing before 6.0
FROM subqueries are pre-materialized (early)
Scalar-context subqueries use straightforward
evaluation
Predicate subqueries
May perform two kinds of rewrites
Then use straightforward evaluation
Properties
No optimization (can get some if you define/use a VIEW equivalent
to subquery)
EXPLAIN command runs the subquery and thus can be very slow
Straightforward subquery evaluation
Used for all kinds of
subqueries other than
FROM:
expr IN (SELECT ...)
EXISTS (SELECT ...)
expr (SELECT.... )
scalar context
subqueries
Subquery is optimized once,
all re-evaluations are done
using the same plan
Uncorrelated subqueries
are evaluated only once
Straightforward subquery evaluation (contd)
SELECT ... FROM outer_tbl1,outer_tbl2
WHERE expr IN (SELECT inner_expr
FROM inner_tbl1, inner_tbl2 WHERE ... )
IN→EXISTS transformation:
OuterExpr IN (SELECT InnerExpr FROM ...
WHERE subq_where)
→
EXISTS (SELECT 1 FROM ...
WHERE subq_where AND
InnerExpr = OuterExpr)
Things to note
Uncorrelated subquery becomes correlated
This is a simplifed description, not counting cases with NULLs
Subquery rewrites: MIN/MAX (2)
“Inform the subquery about which part of its resultset we're
interested in”
MIN/MAX Transformation
OuterExpr > ALL(SELECT InnerExpr FROM ...)
→
OuterExpr > (SELECT MAX(InnerExpr)FROM ... )
handles all similar cases with
OuterExpr (SELECT...)
=> semi-join is like inner join but we need some way to remove
the duplicates
Semi-join strategy #1: Table pullout
If a subquery table is functionally dependent on the parent query
tables, it can be “pulled out” of the subquery
is converted into
SELECT City.Name FROM City, Country
WHERE City.Country = Country.Code AND
Country.SurfaceArea < 2K)
•
If the subquery has several tables, will pull out those tables that don't
generate duplicate matches
Semi-join strategy #1: Table pullout: example
EXPLAIN EXTENDED SELECT City.Name FROM City
WHERE City.Country IN (SELECT Country.Code FROM Country
WHERE Country.SurfaceArea < 10);
SHOW WARNINGS;
In MySQL 4.1/5.x :
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY City ALL NULL NULL NULL NULL 4079 Using where
DEPENDENT PRIMARY,
2 Country unique_subquery PRIMARY 3 func 1 Using where
SUBQUERY SurfaceArea
In MySQL 6.0 :
id select_type table type possible_keys key key_len ref rows Extra
PRIMARY, Using index condition;
1 PRIMARY Country range SurfaceArea 4 NULL 3
SurfaceArea Using MRR
1 PRIMARY City ref Country Country 3 Country.Code 18
select `world`.`City`.`Name` AS `Name` from `world`.`Country` join `world`.`City`
where ((`world`.`City`.`Country` = `world`.`Country`.`Code`) and (`world`.`...
Semi-join strategy #1: Table pullout: summary
In two words, this is subquery-to-join conversion
Properties
It is rule-based, pullout is done whenever possible
It enables the join optimizer to make a cost-based choice
from a greater variety of query plans (including a plan that is
eqivalent to pre-6.0 server strategy)
Applicability
Pullout is done before any other semi-join strategy
considerations
Can handle correlated subqueries (analogous functionality
in PostgreSQL, surprisingly, doesn't)
Can handle arbitarily deep subquery nesting
Semi-join strategy #2: FirstMatch
Short-cut enumeration of subquery tables as soon as we get
first matching row combination
SELECT * FROM ot1,ot2,nt1, ...
WHERE expr(ot1,ot2) IN (SELECT ... FROM it1,it2 ...)
Semi-join strategy #2: FirstMatch: example
EXPLAIN EXTENDED
SELECT Name FROM Country
WHERE
Country.Continent='Europe' AND
Country.Code IN (SELECT City.Country FROM City
WHERE City.ID != Country.Capital AND
Population > 1M)
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY Country ref PRIMARY, Continent Continent 21 const 8 Using index condition;
Using where;
1 PRIMARY City ref Population, Country Country 3 Country.Code 18
FirstMatch(Country)
@@optimizer_switch =
'no_semijoin,no_materialization'
Already seeing a need for hints but no WL entry for this yet
thinking of syntax like
outer_expr IN (SELECT no_materialize ...)
Benchmarking new optimizations
A look at standard benchmarks: DBT-{1,2,3}
DBT-3 has 10 subquery cases
Of which 8 are not covered by new optimizations (2 are
covered)
Query #18: covered (materialization), execution times:
Engine Query time
MySQL 6.0, no new optimizations > 3 hours
MySQL 6.0, materialization 3.76 sec ~1800 times
PostgreSQL 6.52 sec faster now
Smart choice
between
semi-join and
NULL handling materialization
FROM flattening
Semi-join
W
PR OR
O K
G IN
FROM subquery flattening RE
SS
Merge the FROM subquery into the upper join
SELECT ...
FROM (SELECT * FROM inner_tbl WHERE ...) tbl,
...
WHERE tbl.col='foo' AND ...
'bar'
'foo' IN (SELECT col FROM 'baz' ) = NULL
NULL
References
6.0 Subquery optimizations cheatsheet
http://forge.mysql.com/wiki/6.0_Subquery_Optimization_Cheatsheet
Technlical specs: Subquery optimizations: semijoin:
WL#3985 and its subtasks
http://forge.mysql.com/worklog/task.php?id=3985
Technical specs: Subquery optimizations:
materialization: WL#1110
http://forge.mysql.com/worklog/task.php?id=1110
MySQL 6.0 Subquery optimization benchmarks
http://forge.mysql.com/wiki/6.0_Subquery_Optimization_Benchmarks
Observations and news about subquery development
http://s.petrunia.net/blog/
The end
Thank you
Q&A