DBMS Topics
Cost Estimation
Last Updated : 21 May, 2026
Cost estimation is the process of predicting the execution cost of a query plan without actually running it. The query optimizer uses cost estimates to compare alternativ
Overview
Cost estimation is the process of predicting the execution cost of a query plan without actually running it. The query optimizer uses cost estimates to compare alternative plans and select the cheapest one.
The primary cost metric is number of disk block transfers (I/Os), since disk access is orders of magnitude slower than CPU operations.
Notation
| Symbol | Meaning |
|---|---|
| n(R) | Number of tuples in relation R |
| b(R) | Number of blocks occupied by R |
| bfr(R) | Blocking factor — tuples per block |
| V(A, R) | Number of distinct values of attribute A in R |
| h_i | Height of B+ tree index |
| l_i | Number of leaf blocks in B+ tree index |
Cost of Basic Operations
1. Linear (Full) Scan
Reads every block of the relation.
2. Binary Search (on sorted file)
3. Primary Index (B+ Tree) on Key Attribute
4. Secondary Index on Non-Key Attribute
| Worst case: each matching tuple is in a different block | very expensive |
| Example | V(DeptID, Employee) = 10 departments, n(R) = 10,000 |
| If each is in a different block | Cost = 3 + 1,000 = 1,003 block reads |
5. Selection Cost with Simple Predicate
| Using primary B+ tree index | h_i + 1 |
| Using secondary index | h_i + n(R)/V(A,R) (worst case) |
| Full table scan | b(R) / 2 (average for equality, unsorted) |
| Binary search (sorted file) | ⌈log₂(b(R))⌉ + n(R)/(2×bfr(R)×V(A,R)) |
6. Sorting Cost (External Merge Sort)
| Example | b(R) = 1,000, M = 50 buffer pages |
| Runs after Phase 1 | ⌈1000/50⌉ = 20 runs |
| Merge passes | ⌈log₄₉(20)⌉ = 1 pass |
| Total | 2 × 1,000 × 2 = 4,000 block I/Os |
7. Join Cost Estimation
Nested Loop Join
R ⋈_θ S (R as outer, S as inner)
Cost = b(R) + n(R) × b(S) (block-level nested loop)
Better — Block Nested Loop:
Cost = b(R) + ⌈b(R)/(M−2)⌉ × b(S)
Where M = available memory buffer pages
Sort-Merge Join
Hash Join
| Phase 1 (Partition) | 2 × (b(R) + b(S)) |
| Phase 2 (Probe) | b(R) + b(S) |
| Total | 3 × (b(R) + b(S)) |
| Requirement | smaller relation must fit in memory after partitioning |
Cardinality Estimation for Join Result
|R ⋈_{R.A=S.B} S| = n(R) × n(S) / max(V(A,R), V(B,S))
Assumptions:
- Uniform distribution
- Attribute independence
Example:
n(Employee) = 10,000
n(Department) = 100
V(DeptID, Employee) = 100
V(DeptID, Department) = 100 (PK, so = n(Department))
|Employee ⋈ Department| = 10,000 × 100 / max(100, 100)
= 10,000 ✓ (each employee → 1 dept)
Statistics Maintenance
Statistics are stored in the system catalog and updated by
1. Automatic: DBMS updates stats periodically
2. Manual: DBA runs ANALYZE / UPDATE STATISTICS
SQL commands
ANALYZE TABLE Employee; -- MySQL
ANALYZE Employee; -- PostgreSQL
UPDATE STATISTICS Employee; -- SQL Server
Outdated stats → poor plan choices → slow queries!
Best practice: Run ANALYZE after bulk loads or major data changes.
Cost Summary Table
| Operation | Cost (block I/Os) |
|---|---|
| Full table scan | b(R) |
| B+ tree equality (primary) | h_i + 1 |
| B+ tree equality (secondary) | h_i + n(R)/V(A,R) |
| Binary search (sorted) | ⌈log₂(b(R))⌉ |
| External sort | 2 × b(R) × (passes + 1) |
| Hash join | 3 × (b(R) + b(S)) |
| Sort-merge join | ≈ 3 × (b(R) + b(S)) |
| Block nested loop join | b(R) + b(R)/M × b(S) |
Exam Focus
Revise definitions, diagrams, examples, and short-answer points for Cost Estimation.
Interview Use
Prepare one clear explanation, one practical example, and one common mistake for this DBMS topic.
Search Terms
dbms, database management system, database notes, sql, unit, cost, estimation, cost estimation
Related DBMS Topics