# B+ Tree
## What is a B+ Tree?
A **B+ Tree** is a variation of the B-Tree where:
- **All data records** (or pointers to them) are stored **only in leaf nodes**
- **Internal nodes** store only **keys** used for routing/navigation
- **Leaf nodes** are linked together in a **doubly linked list**, enabling efficient range queries
B+ Trees are the most widely used index structure in modern DBMS (MySQL InnoDB, PostgreSQL, Oracle, SQL Server).
---
## B+ Tree Structure
```
B+ Tree of Order 4 (max 3 keys per node):
INTERNAL NODES (keys only — routing)
[20 | 40]
/ | \
[10 | 15] [25 | 30] [45 | 50]
/ | \ / | \ / | \
LEAF NODES (actual data records + linked list):
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│[5][8][10]│→ │[15][18] │→ │[20][25] │→ │[40][45] │→ ...
└──────────┘ └──────────┘ └──────────┘ └──────────┘
(leaf nodes linked as a sorted linked list → efficient range scan)
```
---
## Key Differences from B-Tree
```
B-TREE:
Internal node: [P₀ | K₁ | Data₁ | P₁ | K₂ | Data₂ | P₂]
↑ data stored here too (wastes space)
B+ TREE:
Internal node: [P₀ | K₁ | P₁ | K₂ | P₂]
↑ keys only (more keys fit → wider fan-out)
Leaf node: [K₁|Data₁] [K₂|Data₂] [K₃|Data₃] → Next_Leaf_Ptr
↑ all data here, linked to next leaf
```
---
## B+ Tree Properties (Order n)
- Each internal node has at most **n pointers** (children)
- Each internal node (except root) has at least **⌈n/2⌉ pointers**
- Each internal node with k pointers has **k−1 keys**
- Each leaf node has at most **n−1 data pointers**
- Each leaf node has at least **⌈(n−1)/2⌉ data pointers**
- All leaves are at the **same level**
- Leaf nodes form a **sorted linked list**
---
## Search in B+ Tree
```
Search for key K:
1. Start at the root
2. At each internal node, use keys to choose the correct child pointer:
- If K < K₁ → follow leftmost pointer
- If K₁ ≤ K < K₂ → follow second pointer
- Continue until leaf is reached
3. At leaf node, search for K
- Found → return associated data record
- Not found → K does not exist
Example: Search for 25 in B+ Tree above:
Root [20|40]: 20 ≤ 25 < 40 → go to middle child [25|30]
Leaf [25|30]: found K=25 ✓
Cost: 2 node reads + 1 data record read
```
---
## Insertion in B+ Tree
```
Insert key K with data pointer D:
Step 1: Find the correct leaf node L using search
Step 2: If L has space → insert (K, D) in sorted order in L
Step 3: If L is FULL (has n−1 entries):
a. SPLIT L into L and L'
- L keeps ⌈n/2⌉ entries (smaller keys)
- L' gets remaining entries (larger keys)
- Copy the FIRST KEY of L' up to the parent
(copy — unlike B-Tree which promotes from internal nodes)
b. Insert a pointer to L' in parent
c. If parent is full → split parent too (push up median key)
d. If root splits → create new root → height increases by 1
KEY RULE: In B+ Tree splits, the middle key is COPIED to parent
(not removed from leaf, unlike B-Tree).
All data remains in leaves.
```
### Insertion Example (Order 4)
```
Initial leaf: [10 | 20 | 30] (full, max 3 keys for order 4)
Insert 25:
→ Leaf is full → SPLIT
→ Left leaf: [10 | 20]
→ Right leaf: [25 | 30]
→ Copy 25 up to parent: parent gets key=25
→ Parent now routes: key < 25 → left leaf, key ≥ 25 → right leaf
```
---
## Deletion in B+ Tree
```
Delete key K:
Step 1: Find leaf L containing K
Step 2: Remove K from L
Case A: L still has ≥ ⌈(n−1)/2⌉ entries → DONE ✓
Case B: L underflows (too few entries):
Option 1 — REDISTRIBUTE (borrow from sibling):
If left or right sibling has extra keys:
Move one key from sibling to L
Update the parent separator key
Option 2 — MERGE (if no sibling has extra):
Merge L with its sibling
Delete the separator key from parent
If parent underflows → handle parent (recursively)
```
---
## B+ Tree Range Query
```
Find all records with key between K₁ and K₂:
1. Search for K₁ → find leaf L₁ containing K₁
2. Scan leaves from left to right using linked list pointers
until key > K₂
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│[5][10] │──►│[15][20]│──►│[25][30]│──►│[35][40]│──►...
└────────┘ └────────┘ └────────┘ └────────┘
↑ start here ↑ stop here
(K₁=15) (K₂=35)
Returns: 15, 20, 25, 30, 35
This is extremely efficient: O(log n + k)
where k = number of matching records
```
---
## B+ Tree Height & Performance
```
Order n, storing N records:
Height h = ⌈log_⌈n/2⌉(N)⌉ + 1
For n=200 (typical), N=1,000,000:
h = log₁₀₀(1,000,000) ≈ 3
Operations:
Search: O(log n) = 3-4 disk reads for millions of records
Insert: O(log n) with occasional splits
Delete: O(log n) with occasional merges
Range query: O(log n + k) where k = result size
```
---
## B+ Tree Diagram (Complete Example)
```
B+ Tree (order 5, stores: 10,20,30,40,50,60,70,80,90):
[40]
/ \
[20] [60|80]
/ \ / | \
[10|20] [30|40] [50|60] [70|80] [90]
↓ ↓ ↓ ↓ ↓
data data data data data
Leaf linked list:
[10→20] ──► [30→40] ──► [50→60] ──► [70→80] ──► [90] ──► NULL
Note: Keys 20, 40, 60, 80 appear in both internal nodes AND leaves.
Internal nodes are COPIES, not moves (unlike B-Tree).
```
---
## Why B+ Trees are Preferred Over B-Trees
| Reason | Explanation |
|--------|-------------|
| Higher fan-out | Internal nodes store only keys (no data) → more keys per node → shorter tree |
| Efficient range scans | Linked leaf list enables linear scan after reaching start |
| Predictable performance | All successful searches reach leaf level → consistent I/O cost |
| Better cache usage | Internal (routing) nodes fit better in memory |Back to Course