DBMS - Learn DBMS Online | WohoTech

Unit 5

# B+ Tree ## What is a B+ Tree? A **B+ Tree** is a variation of the B-Tree where: - **All data records** (or pointers to them) are stored **only in leaf nodes** - **Internal nodes** store only **keys** used for routing/navigation - **Leaf nodes** are linked together in a **doubly linked list**, enabling efficient range queries B+ Trees are the most widely used index structure in modern DBMS (MySQL InnoDB, PostgreSQL, Oracle, SQL Server). --- ## B+ Tree Structure ``` B+ Tree of Order 4 (max 3 keys per node): INTERNAL NODES (keys only — routing) [20 | 40] / | \ [10 | 15] [25 | 30] [45 | 50] / | \ / | \ / | \ LEAF NODES (actual data records + linked list): ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │[5][8][10]│→ │[15][18] │→ │[20][25] │→ │[40][45] │→ ... └──────────┘ └──────────┘ └──────────┘ └──────────┘ (leaf nodes linked as a sorted linked list → efficient range scan) ``` --- ## Key Differences from B-Tree ``` B-TREE: Internal node: [P₀ | K₁ | Data₁ | P₁ | K₂ | Data₂ | P₂] ↑ data stored here too (wastes space) B+ TREE: Internal node: [P₀ | K₁ | P₁ | K₂ | P₂] ↑ keys only (more keys fit → wider fan-out) Leaf node: [K₁|Data₁] [K₂|Data₂] [K₃|Data₃] → Next_Leaf_Ptr ↑ all data here, linked to next leaf ``` --- ## B+ Tree Properties (Order n) - Each internal node has at most **n pointers** (children) - Each internal node (except root) has at least **⌈n/2⌉ pointers** - Each internal node with k pointers has **k−1 keys** - Each leaf node has at most **n−1 data pointers** - Each leaf node has at least **⌈(n−1)/2⌉ data pointers** - All leaves are at the **same level** - Leaf nodes form a **sorted linked list** --- ## Search in B+ Tree ``` Search for key K: 1. Start at the root 2. At each internal node, use keys to choose the correct child pointer: - If K < K₁ → follow leftmost pointer - If K₁ ≤ K < K₂ → follow second pointer - Continue until leaf is reached 3. At leaf node, search for K - Found → return associated data record - Not found → K does not exist Example: Search for 25 in B+ Tree above: Root [20|40]: 20 ≤ 25 < 40 → go to middle child [25|30] Leaf [25|30]: found K=25 ✓ Cost: 2 node reads + 1 data record read ``` --- ## Insertion in B+ Tree ``` Insert key K with data pointer D: Step 1: Find the correct leaf node L using search Step 2: If L has space → insert (K, D) in sorted order in L Step 3: If L is FULL (has n−1 entries): a. SPLIT L into L and L' - L keeps ⌈n/2⌉ entries (smaller keys) - L' gets remaining entries (larger keys) - Copy the FIRST KEY of L' up to the parent (copy — unlike B-Tree which promotes from internal nodes) b. Insert a pointer to L' in parent c. If parent is full → split parent too (push up median key) d. If root splits → create new root → height increases by 1 KEY RULE: In B+ Tree splits, the middle key is COPIED to parent (not removed from leaf, unlike B-Tree). All data remains in leaves. ``` ### Insertion Example (Order 4) ``` Initial leaf: [10 | 20 | 30] (full, max 3 keys for order 4) Insert 25: → Leaf is full → SPLIT → Left leaf: [10 | 20] → Right leaf: [25 | 30] → Copy 25 up to parent: parent gets key=25 → Parent now routes: key < 25 → left leaf, key ≥ 25 → right leaf ``` --- ## Deletion in B+ Tree ``` Delete key K: Step 1: Find leaf L containing K Step 2: Remove K from L Case A: L still has ≥ ⌈(n−1)/2⌉ entries → DONE ✓ Case B: L underflows (too few entries): Option 1 — REDISTRIBUTE (borrow from sibling): If left or right sibling has extra keys: Move one key from sibling to L Update the parent separator key Option 2 — MERGE (if no sibling has extra): Merge L with its sibling Delete the separator key from parent If parent underflows → handle parent (recursively) ``` --- ## B+ Tree Range Query ``` Find all records with key between K₁ and K₂: 1. Search for K₁ → find leaf L₁ containing K₁ 2. Scan leaves from left to right using linked list pointers until key > K₂ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │[5][10] │──►│[15][20]│──►│[25][30]│──►│[35][40]│──►... └────────┘ └────────┘ └────────┘ └────────┘ ↑ start here ↑ stop here (K₁=15) (K₂=35) Returns: 15, 20, 25, 30, 35 This is extremely efficient: O(log n + k) where k = number of matching records ``` --- ## B+ Tree Height & Performance ``` Order n, storing N records: Height h = ⌈log_⌈n/2⌉(N)⌉ + 1 For n=200 (typical), N=1,000,000: h = log₁₀₀(1,000,000) ≈ 3 Operations: Search: O(log n) = 3-4 disk reads for millions of records Insert: O(log n) with occasional splits Delete: O(log n) with occasional merges Range query: O(log n + k) where k = result size ``` --- ## B+ Tree Diagram (Complete Example) ``` B+ Tree (order 5, stores: 10,20,30,40,50,60,70,80,90): [40] / \ [20] [60|80] / \ / | \ [10|20] [30|40] [50|60] [70|80] [90] ↓ ↓ ↓ ↓ ↓ data data data data data Leaf linked list: [10→20] ──► [30→40] ──► [50→60] ──► [70→80] ──► [90] ──► NULL Note: Keys 20, 40, 60, 80 appear in both internal nodes AND leaves. Internal nodes are COPIES, not moves (unlike B-Tree). ``` --- ## Why B+ Trees are Preferred Over B-Trees | Reason | Explanation | |--------|-------------| | Higher fan-out | Internal nodes store only keys (no data) → more keys per node → shorter tree | | Efficient range scans | Linked leaf list enables linear scan after reaching start | | Predictable performance | All successful searches reach leaf level → consistent I/O cost | | Better cache usage | Internal (routing) nodes fit better in memory |

Back to Course

Unit 5

Back to Course