# Load Balancing
## What is Load Balancing
Load balancing is the process of distributing incoming network traffic across multiple servers or resources to ensure no single server is overwhelmed. When a large number of users access a website or application simultaneously, a single server may not have enough capacity to handle all requests. Load balancing solves this by spreading the work across multiple servers so that each server handles a manageable portion of the total traffic. This improves application performance, increases availability, and ensures reliability.
## Why Load Balancing is Needed
Modern websites and applications must handle millions of requests per day. A single server cannot handle this volume. Even if a single powerful server could handle the load, relying on one server creates a single point of failure. If that server fails the entire application goes down. Load balancing distributes traffic across multiple servers and detects server failures, automatically redirecting traffic away from failed servers to healthy ones.
## How Load Balancing Works
A load balancer sits between clients and the pool of backend servers. When a client sends a request the load balancer receives it and selects a backend server to handle it based on a configured algorithm. The load balancer forwards the request to the selected server and sends the server's response back to the client. The client interacts only with the load balancer and is unaware of the individual backend servers.
## Load Balancing Algorithms
Round robin sends each new request to the next server in sequence cycling through all servers equally. It is simple but does not account for differences in server capacity or current load. Weighted round robin assigns different weights to servers based on their capacity so more powerful servers receive more requests. Least connections sends each new request to the server with the fewest active connections, which distributes load more intelligently than round robin. Least response time selects the server with both the fewest connections and the lowest average response time. IP hash uses the client IP address to determine which server handles the request, ensuring the same client always reaches the same server.
## Layer 4 and Layer 7 Load Balancing
Layer 4 load balancing operates at the transport layer and makes routing decisions based on IP addresses and TCP or UDP ports without examining the content of the packets. It is fast but cannot make content-based decisions. Layer 7 load balancing operates at the application layer and can examine the content of requests including HTTP headers, URLs, and cookies. This allows sophisticated routing decisions such as routing requests for images to one set of servers and requests for dynamic content to another set.
## Health Checks
Load balancers continuously monitor the health of backend servers by periodically sending health check requests. If a server fails to respond or returns an error, the load balancer marks it as unhealthy and stops sending traffic to it. When the server recovers and starts responding correctly the load balancer adds it back to the pool.
## Types of Load Balancers
Hardware load balancers are dedicated appliances designed for high-performance traffic distribution. Software load balancers run on standard servers and include products like Nginx, HAProxy, and Apache. Cloud load balancers are managed services provided by cloud platforms like AWS Elastic Load Balancer, Azure Load Balancer, and Google Cloud Load Balancing.Back to Subject