Query Optimization in a database system is the process of choosing the most efficient way to
execute a database query. It involves evaluating multiple possible query execution plans and
selecting the one with the lowest cost (in terms of time, memory, or CPU usage).
Why is Query Optimization Important?
• To reduce query response time.
• To minimize resource usage (CPU, memory, disk I/O).
• To improve overall database performance, especially for large datasets and complex
queries.
How It Works:
When a query is submitted:
1. Parser checks syntax and semantics.
2. Query Optimizer generates multiple execution strategies (query plans).
3. Cost Estimation: The optimizer estimates the cost of each plan using statistics (e.g.,
table size, index usage).
4. Best Plan Selection: The lowest-cost plan is selected and passed to the execution engine.
Techniques Used in Query Optimization:
1. Use of Indexes – Faster access to rows.
2. Join Optimization – Choosing efficient join order and method (nested loop, hash join,
merge join).
3. Predicate Pushdown – Filtering rows as early as possible.
4. Query Rewriting – Simplifying or transforming queries for better performance.
5. Materialized Views – Using precomputed results.
Example Original Query:
SELECT * FROM orders WHERE customer_id IN (SELECT customer_id FROM customers WHERE
country = 'Germany');
Optimized Version:
SELECT o.*
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.country = 'Germany';
Using a join here may be more efficient, especially if indexes exist on customer_id.
Components of Query Optimizer in Database Systems
The Query Optimizer is a critical part of a Database Management System (DBMS)
responsible for determining the most efficient way to execute a query. Here are the main
components of a query optimizer:
1. Query Parser and Translation
• Role: Converts the SQL query into an internal representation (often a parse tree or
abstract syntax tree).
• Checks: Syntax errors and semantic errors (e.g., verifying table and column names).
• Output: Parse tree used by the optimizer.
2. Transformation
• Role: Transforms the original query into an equivalent but potentially more efficient
version.
• Techniques:
o View expansion
o Subquery flattening
o Predicate pushdown
• Example:
o Converts SELECT * FROM view_name into the actual query definition of the view.
3. Plan Generator
• Role: Converts the parsed query into a logical query plan.
• Logical Plan: Represents operations like selection, projection, joins, etc., without
specifying how they are implemented.
4. Cost Estimator / Optimization
• Role: Estimates the cost (time, memory, I/O) of different query plans.
• Uses:
o Table statistics (e.g., number of rows, distinct values)
o Index availability
o Data distribution
• Cost Factors:
o Disk I/O
o CPU usage
o Memory usage
o Network cost (in distributed systems)
5. Physical Plan Generator / Execution & Feedback
• Role: Transforms logical plans into physical query plans, which specify actual
algorithms and access methods.
• Examples:
o Join methods: Nested Loop, Merge Join, Hash Join
o Access paths: Table scan, Index scan, Index seek
6. Plan Selection
• Role: Chooses the best plan based on cost estimates.
• Output: The final execution plan, which is passed to the query executor.
Apply the Query Optimizer steps:
SELECT * FROM Product WHERE Category = 10;
Step-by-Step Explanation of Query Optimizer Processes
1. Parsing & Translation
• Parsing: The DBMS checks the query for syntax errors and ensures that:
o Table Product exists.
o Column Category exists in the Product table.
o Translation: The query is converted into an internal form.
2. Transformation
• The optimizer simplifies or rewrites the query if needed.
• Since this is a simple SELECT with a WHERE condition, no major transformation is
required.
• In more complex queries, it might rearrange joins or push down selections.
3. Optimization
• The optimizer analyzes:
o How many rows in Product have Category = 10.
o If there is an index on the Category column.
• It chooses between:
o Index Scan (if Category is indexed).
o Table Scan (if there’s no index or if many rows match).
4. Plan Generation
• The DBMS creates execution plans based on possible access methods:
o Full Table Scan (slower for large tables).
o Index Seek or Index Scan (faster if index exists).
Example Plan:
• Index Seek on Category → Fetch matching rows.
5. Plan Selection
• The optimizer estimates the cost of each plan using statistics (like number of rows, data
distribution).
• Chooses the most efficient plan (least I/O, CPU).
6. Execution & Feedback
• The selected plan is executed.
• The engine fetches all records from Product where Category = 10.
• If adaptive optimization is enabled, feedback (e.g., actual row count) may be stored to
improve future queries.
Poor Performance Query (Without Index)
Suppose you want to retrieve all products in the "Electronics" category:
SELECT * FROM products WHERE category = 'Electronics';
What's the Problem?
• If there is no index on the category column:
o The DBMS will scan all 2 million rows (a full table scan) to find matching
rows.
o This is slow, especially if only a few thousand rows match.
Optimized Solution Using Index
You can speed this up by creating an index on the category column:
CREATE INDEX idx_category ON products(category);
Now, when you run:
SELECT * FROM products WHERE category = 'Electronics';
What Happens Now:
• The DBMS uses the index on category to quickly locate the matching rows, skipping
over millions of unrelated rows.
• This avoids a full scan and significantly reduces execution time.
Performance Comparison:
Aspect Without Index With Index
Scan Type Full Table Scan Index Scan
Rows Scanned 2,000,000 Few thousand (matched)
Speed Slow Fast
CPU + I/O Usage High Low
Components of the Optimizer:
1. Query Transformer
• What it does: It rewrites the SQL query into a better or simpler form without changing
its meaning.
• Example: Converts subqueries to joins or removes unnecessary parts.
2. Estimator
• What it does: It guesses how much data will be processed (like how many rows will be
returned).
• How it works: Uses statistics from the Data Dictionary (which stores data about the
tables, indexes, etc.).
• Why it's useful: Helps choose the fastest method for executing the query.
3. Plan Generator
• What it does: Based on the estimates, it creates different possible query plans (ways to
execute the query).
• Then it selects the best one, i.e., the most efficient and fastest.
Data Dictionary
• What it is: A storage of metadata (data about the database objects like tables, indexes,
columns, etc.).
• The Estimator uses this to get useful info like table sizes, indexes available, and row
counts.