DBMS Topics
Query Optimization
Last Updated : 21 May, 2026
Query optimization is the process of selecting the most efficient execution plan from among all possible plans for a given SQL query. The optimizer's goal is to minimize
What is Query Optimization?
Query optimization is the process of selecting the most efficient execution plan from among all possible plans for a given SQL query. The optimizer's goal is to minimize the total cost (usually measured in disk I/O operations).
| Plan 1 | (A ⋈ B) ⋈ C — join A and B first, then with C |
| Plan 2 | A ⋈ (B ⋈ C) — join B and C first, then with A |
| Plan 3 | (A ⋈ C) ⋈ B — join A and C first (if possible), then B |
| (n tables | n! possible join orders, but smart pruning reduces this) |
Types of Query Optimization
1. Rule-Based (Heuristic) Optimization
Apply algebraic transformation rules to rewrite the query into a more efficient form, without calculating actual costs.
Key Heuristic Rules:
| Rule 1 | Push selections as early as possible (reduce rows early) |
| BEFORE | σ_{salary>70000}(Employee ⋈ Department) |
| AFTER | σ_{salary>70000}(Employee) ⋈ Department |
| Rule 2 | Push projections as early as possible (reduce columns) |
| BEFORE | π_{name, dept_name}(Employee ⋈ Department) |
| AFTER | π_{name, dept_id}(Employee) ⋈ π_{dept_id, dept_name}(Department) |
| Rule 3 | Most restrictive selection first |
| Rule 4 | Join order — smaller intermediate results first |
| Rule 5 | Use indexes when available |
| If an index exists on the WHERE predicate column | use it |
2. Cost-Based Optimization
Generate multiple candidate plans and estimate the cost of each. Choose the minimum-cost plan.
Cost Model
Total Cost = Disk I/O Cost + CPU Cost + Memory Cost
Disk I/O dominates for large data → most optimizers minimize I/O
Estimate for each plan
- Number of disk block reads for each operator
- Size (cardinality) of intermediate results
- Available memory for sorting / hashing
Cost Estimation Inputs
The optimizer uses statistics stored in the system catalog:
For each table T
- n(T) = number of tuples (rows)
- b(T) = number of blocks
- bfr(T) = blocking factor (rows per block)
- V(A, T) = number of distinct values of attribute A
- min/max values of each attribute
- Histogram of value distributions
For each index I
- Height of B+ tree (number of levels)
- Number of leaf blocks
Selectivity and Cardinality Estimation
Selectivity of a predicate = fraction of tuples that satisfy it.
Selectivity Formulas
Equality: A = v
sel = 1 / V(A, T) (assume uniform distribution)
Range: A > v
sel = (max(A) - v) / (max(A) - min(A))
Conjunction: P1 AND P2
sel = sel(P1) × sel(P2) (independence assumption)
Disjunction: P1 OR P2
sel = sel(P1) + sel(P2) - sel(P1) × sel(P2)
Expected output cardinality
|σ_P(T)| = sel(P) × n(T)
Join Cardinality Estimation
|R ⋈ S| (natural join on attribute A):
= |R| × |S| / max(V(A,R), V(A,S))
Intuition: each tuple in R matches |S| / V(A,S) tuples in S on average.
For foreign key join:
|Employee ⋈ Department| = |Employee|
(each employee belongs to exactly one department)
Plan Enumeration
For 2-3 Tables: Enumerate All Plans
| Plans | A ⋈ B or B ⋈ A (with different algorithms for each) |
| Plans | (A⋈B)⋈C, (A⋈C)⋈B, (B⋈C)⋈A, A⋈(B⋈C), B⋈(A⋈C), C⋈(A⋈B) |
| For each | choose join algorithm (NL, sort-merge, hash) |
For Many Tables: Dynamic Programming
Execution Plan Output (EXPLAIN)
EXPLAIN SELECT e.name, d.dept_name
FROM Employee e
JOIN Department d ON e.dept_id = d.dept_id
WHERE e.salary > 70000
ORDER BY e.name;
-- Example Output (MySQL):
+----+--------+------------+--------+------+------+--------+
| id | type | table | key | rows | filt | Extra |
+----+--------+------------+--------+------+------+--------+
| 1 | ALL | d | NULL | 10 | 100 | NULL |
| 1 | ref | e | idx_d | 5000 | 33 | Using |
| | | | | | | where; |
| | | | | | | filesort|
+----+--------+------------+--------+------+------+--------+
-- Interpretation:
-- d (Department): full table scan (small table, OK)
-- e (Employee): index ref on dept_id, filter salary > 70000
-- filesort: ORDER BY requires sorting (no covering index)Query Optimization Tips
-- 1. Ensure WHERE columns are indexed
CREATE INDEX idx_salary ON Employee(salary);
-- 2. Use covering indexes (index contains all needed columns)
CREATE INDEX idx_dept_name_sal ON Employee(dept_id, name, salary);
-- Query: SELECT name, salary WHERE dept_id = 1 — uses index only, no heap access
-- 3. Avoid functions on indexed columns in WHERE
-- BAD (index not used):
SELECT * FROM Employee WHERE YEAR(hire_date) = 2020;
-- GOOD (index used):
SELECT * FROM Employee WHERE hire_date BETWEEN '2020-01-01' AND '2020-12-31';
-- 4. Avoid SELECT * — project only needed columns
-- 5. Use LIMIT to avoid fetching unnecessary rows
SELECT name FROM Employee WHERE salary > 70000 LIMIT 10;Exam Focus
Revise definitions, diagrams, examples, and short-answer points for Query Optimization.
Interview Use
Prepare one clear explanation, one practical example, and one common mistake for this DBMS topic.
Search Terms
dbms, database management system, database notes, sql, unit, query, optimization, query optimization
Related DBMS Topics