Unit 5 Summary — Storage & Indexing | DBMS Notes

Quick Reference

File Organization Comparison

Heap	Insert O(1), Search O(N), Range: Poor
Sequential	Insert O(N), Search O(logN), Range: Excellent
Hashed	Insert O(1), Search O(1), Range: Poor
Clustered	Insert O(logN), Search O(logN), Range: Excellent

Index Types

Type	Built On	Density	Order Required
Primary	Ordering key	Sparse	Yes
Clustering	Non-key ordering field	Sparse	Yes
Secondary	Any non-ordering field	Dense	No

Dense vs. Sparse Index

Dense: One entry per RECORD → larger index, faster exact lookup Sparse: One entry per BLOCK → smaller index, needs intra-block scan

B-Tree vs B+ Tree

B-Tree: Data in ALL nodes (internal + leaf); no leaf linking B+ Tree: Data ONLY in leaf nodes; leaves linked as sorted list → Higher fan-out → shorter tree → used in all modern DBMS

B+ Tree Operations Cost

Search	O(log_n N) ≈ 3–4 disk reads for millions of records
Insert	O(log_n N) + occasional splits
Delete	O(log_n N) + occasional merges
Range	O(log_n N + k) where k = result set size

Hashing

Static	Fixed B buckets; fast for stable data; overflow chains
Extendible	Directory doubles on split; O(1) search; dir overhead
Linear	No directory; splits one at a time; no dir doubling
All hashing	O(1) equality search; POOR range queries

Query Processing Steps

SQL → Parse → Translate to Rel. Algebra → Optimize → Execute

Join Algorithm Costs

Algorithm	Cost
Nested Loop	b(R) + n(R) × b(S)
Block Nested Loop	b(R) + ⌈b(R)/(M−2)⌉ × b(S)
Sort-Merge	≈ 3(b(R) + b(S))
Hash Join	3(b(R) + b(S))
Index NL Join	b(R) + n(R) × (h_i + 1)

Key Cost Estimation Formulas

Full scan	b(R)
B+ tree primary	h_i + 1
B+ tree secondary	h_i + n(R)/V(A,R)
Join result size	n(R) × n(S) / max(V(A,R), V(A,S))

Optimization Heuristics

Push selections (σ) down the query tree — filter early
Push projections (π) down — drop columns early
Most restrictive condition first
Use indexes on WHERE / JOIN columns
Join smaller intermediate results first

Quick Reference

File Organization Comparison

Heap	Insert O(1), Search O(N), Range: Poor
Sequential	Insert O(N), Search O(logN), Range: Excellent
Hashed	Insert O(1), Search O(1), Range: Poor
Clustered	Insert O(logN), Search O(logN), Range: Excellent

Index Types

Type	Built On	Density	Order Required
Primary	Ordering key	Sparse	Yes
Clustering	Non-key ordering field	Sparse	Yes
Secondary	Any non-ordering field	Dense	No

Dense vs. Sparse Index

Dense: One entry per RECORD → larger index, faster exact lookup Sparse: One entry per BLOCK → smaller index, needs intra-block scan

B-Tree vs B+ Tree

B-Tree: Data in ALL nodes (internal + leaf); no leaf linking B+ Tree: Data ONLY in leaf nodes; leaves linked as sorted list → Higher fan-out → shorter tree → used in all modern DBMS

B+ Tree Operations Cost

Search	O(log_n N) ≈ 3–4 disk reads for millions of records
Insert	O(log_n N) + occasional splits
Delete	O(log_n N) + occasional merges
Range	O(log_n N + k) where k = result set size

Hashing

Static	Fixed B buckets; fast for stable data; overflow chains
Extendible	Directory doubles on split; O(1) search; dir overhead
Linear	No directory; splits one at a time; no dir doubling
All hashing	O(1) equality search; POOR range queries

Query Processing Steps

SQL → Parse → Translate to Rel. Algebra → Optimize → Execute

Join Algorithm Costs

Algorithm	Cost
Nested Loop	b(R) + n(R) × b(S)
Block Nested Loop	b(R) + ⌈b(R)/(M−2)⌉ × b(S)
Sort-Merge	≈ 3(b(R) + b(S))
Hash Join	3(b(R) + b(S))
Index NL Join	b(R) + n(R) × (h_i + 1)

Key Cost Estimation Formulas

Full scan	b(R)
B+ tree primary	h_i + 1
B+ tree secondary	h_i + n(R)/V(A,R)
Join result size	n(R) × n(S) / max(V(A,R), V(A,S))

Optimization Heuristics

Push selections (σ) down the query tree — filter early
Push projections (π) down — drop columns early
Most restrictive condition first
Use indexes on WHERE / JOIN columns
Join smaller intermediate results first

Unit 5 Summary — Storage & Indexing

Quick Reference

File Organization Comparison

Index Types

Dense vs. Sparse Index

B-Tree vs B+ Tree

B+ Tree Operations Cost

Hashing

Query Processing Steps

Join Algorithm Costs

Key Cost Estimation Formulas

Optimization Heuristics

Continue learning this concept

Unit 5 Summary — Storage & Indexing

Quick Reference

File Organization Comparison

Index Types

Dense vs. Sparse Index

B-Tree vs B+ Tree

B+ Tree Operations Cost

Hashing

Query Processing Steps

Join Algorithm Costs

Key Cost Estimation Formulas

Optimization Heuristics

Continue learning this concept