What Is Load Balancing & How Does It Work?

What is an origin server

Contents

Try CDNetworks For Free

Most of our products have a 14 day free trial. No credit card needed.

Share This Post

Load balancing is the process by which network or application traffic is distributed across multiple servers in a server farm or server pool. The fundamental idea behind load balancers is to avoid overloading compute nodes by routing client requests or traffic to other potentially idle nodes. Network devices or software called load balancers placed between client devices and backend servers. These are responsible for receiving and routing incoming traffic to servers which can fulfil requests.

Load balancing has become an effective way for businesses to keep up with rising demand and ensure that their applications are up and running for users. Today’s businesses receive hundreds and thousands of client requests per minute to their websites and applications. During peak season or hours, this traffic can spike up even more. Servers are under pressure to keep up and respond with high-quality media including photos, videos and other application data. Any interruptions or downtime in these applications can result in below par experiences and turn away users, resulting in loss of profits.

What is a Load Balancer?

A load balancer is a device or process in a network that analyzes incoming requests and diverts them to the relevant servers. Load balancers can be physical devices in the network,  virtualized instances running on  specialized hardware (virtual load balancers) or even a software process. It could also be incorporated into application delivery controllers (ADCs) – network devices designed to improve the performance and security of applications in general.

To ensure that users get a consistent experience load balancers follow the Open Systems Interconnection (OSI) model. OSI is a set of standards for communication functions for a system that does not depend on the underlying internal structure or technology. According to this model, load balancing should occur at two layers for an optimum and consistent user experience.

Layer 4 (L4) load balancers

These load balancers make the decisions on how to route traffic packets based on the TCP or UDP ports that they use and the IP addresses of their source and destination. L4 load balancers do not inspect the actual contents of the packet but map the IP address to the right servers in a process called Network Address Translation.

Layer 7 (L7) load balancers

L7 load balancers act at the application layer and are capable of inspecting HTTP headers, SSL session IDs and other data to decide which servers to route the incoming requests to and how. Since they require additional context in understanding and processing client requests to servers, L7 load balancers are computationally more CPU-intensive than L4 load balancers, but more efficient as a result.

There is another type of load balancing called Global Server Load Balancing. This extends the capabilities of L4 and L7 load balancers across multiple data centers in order to distribute large volumes of traffic without negatively affecting the service for end users. These are also especially useful for handling application requests from cloud data centres distributed across geographies.

A Brief History of Load Balancing

Load balancing came to prominence in the 1990s as hardware appliances distributing traffic across a network. As internet technologies and connectivity improved rapidly, web applications became more complex and their demands exceeded the capabilities of individual servers. There was a need to find better ways to take multiple requests for similar resources and distribute them effectively across servers. This was the genesis of load balancers.

Since load balancing allowed web applications to avoid relying on individual servers, it also helped in scaling these applications easily beyond what a single server could support. Soon, there were other functionalities that evolved, including the ability to provide continuous health checks, intelligent distribution based on the application’s content and other specialized functions.

The rise of ADCs in the early 2000s was a major milestone in the history of application load balancing. ADCs are network devices that were developed with the goal of improving the performance of applications and application load balancing became one of the ways to achieve that. But they would soon evolve to cover more application services including compression, caching, authentication, web application firewalls and other security services.

Load Balancing and Cloud Computing

As cloud computing slowly began to dominate application delivery, ADCs evolved along as well. Having started out as hardware appliances, ADCs also took the form of virtual machines with the software extracted from legacy hardware and even pure software load balancers. Software ADCs perform tasks similar to their hardware counterparts but also provide more functionalities and flexibility. They allow organizations to rapidly scale up application services in the cloud environments to meet demand spikes, while maintaining security.

How Does Load Balancing Work?

Load balancers could take the form of hardware devices in the network, or they could be purely software-defined processes. No matter which form they come in, they all work by disbursing network traffic to different web servers based on various conditions to prevent overloading any one server.

Think of load balancers like traffic cops redirecting heavy traffic to less crowded lanes to avoid congestion. Load balancers effectively manage the seamless flow of information between application servers and an endpoint device like a PC, laptop or tablet. The servers in question could be on-premise, in a data centre or in the cloud. Without a load balancer, individual servers can get overwhelmed and applications can become unresponsive, leading to delays in response, poor use experiences and loss of revenues.

The exact mechanism by which load balancers work depends on whether they are hardware appliances or software.

Hardware vs Software Load Balancing

Hardware-based load balancers work by using on-premises hardware and physical devices to distribute network load. These are capable of handling a large volume of network traffic and high-performance applications. Hardware load balancers may also contain built-in virtualization, consolidating many instances in the same device.

Since they use specialized processors to run the software, they offer fast throughput, while the need for physical access to network or application servers increases the security. On the downside, hardware load balancers can be costly as it requires purchase of physical machines and paid consultants to configure, program and maintain the hardware.

How Does Software Load Balancing Work?

Software-based load balancers on the other hand can deliver the same benefits as hardware load balancers while replacing the expensive hardware. They can run on any standard device and thereby save space and hardware costs. Software load balancers offer more flexibility to adjust for changing requirements and can help you scale capacity by adding more software instances. They can also easily be used for load balancing on the cloud in a managed, off-site solution or in a hybrid model with in-house hosting as well.

DNS load balancing is a software-defined approach to load balancing. Here, client requests to a domain within the Domain Name System (DNS) are distributed across various servers. Every time the DNS system responds to a new client request, it sends a different version of the list of IP addresses. This ensures that the DNS requests are distributed evenly to different servers to handle the overall load. With non-responsive servers being automatically removed, DNS load balancing allows for automatic failover or backup to a working server. 

The Different Types of Load Balancing Algorithms

There are several methods or techniques that network load balancers use to manage and distribute load. They differ in the algorithms they use to determine which application server should receive each client request. The five most common load balancing methods are:

Round Robin

In this method, an incoming request is forwarded to each server on a cyclical basis. When it reaches the last server, the cycle is repeated beginning with the first one. It is one of the simplest methods to implement but may not be the most efficient, as it assumes that all servers are of similar capacity. There are two other variants of this method – weighted round robin and dynamic round robin – that can adjust for this assumption.

IP Hash

This is a relatively straightforward method of load balancing, where the client’s IP address determines which server receives its request. It uses an algorithm to generate a unique hash key, or an encrypted version of the source and destination IP address. This key is then used to allocate the client’s requests to a particular server.

Least Connections

In the Least Connections method, traffic is diverted to the server that has the least amount of active connections. Ideal for scenarios when there are periods of heavy traffic, this method helps distribute the traffic evenly among all available servers.

Least Response Time

In the least response time method, traffic is directed to the server that satisfies two conditions – it should have the fewest amount of active connections and lowest average response time.

Least Bandwidth

In this method, the load balancer looks at the bandwidth consumption of servers in Mbps for the last fourteen seconds. The one that consumes the least bandwidth is chosen to send client requests to.

load balanacing and ddos attack mitigation

Load Balancing Benefits 

At the end of the day, load balancing is about helping businesses effectively manage network traffic and application load in order to give end users a reliable, consistent experience. In doing this, load balancers provide the following benefits.

Scalability to meet traffic spikes

Load balancing helps businesses stay on top of traffic fluctuations or spikes and increase or decrease servers to meet the changing needs. This helps businesses capitalize on sudden increases in customer demands to increase revenue. For example, e-commerce websites can expect a spike in network traffic during holiday seasons and during promotions. The ability to scale server capacity to balance their loads could be the difference between a sales boost from new or retained customers and a significant churn due to unhappy customers.

Redundancy to minimize downtime

It is not uncommon for website servers to fail in times of unprecedented traffic spikes. But if you can maintain the website on more than one web server, you can limit the damage that downtime on any single server can cause. Load balancing helps you automatically transfer the network load to a working server if one fails, adding a layer of automation to modernize your workloads. You can keep one server in an active mode to receive traffic while the other remains in a passive mode, ready to go online if the active one fails. This arrangement gives businesses an assurance that one server will always be active to handle instances of hardware failure.

Flexibility to perform maintenance

The ability to divert traffic to a passive server temporarily also allows developers the flexibility to perform maintenance work on faulty servers. You can point all traffic to one server and set the load balancer in active mode. Your IT support team can then perform software updates and patches on the passive server, test in a production environment and switch the server to active once everything works right.

Proactive failure detection

Load balancing helps businesses detect server outages and bypass them by distributing resources to unaffected servers. This allows you to manage servers efficiently, especially if they are distributed across multiple data centres and cloud providers. This is especially true in the case of software load balancers which can employ predictive analytics to find potential traffic bottlenecks before they happen.

DDoS attack mitigation

The ability of load balancers to distribute traffic across servers also becomes useful in defending against Distributed Denial of Service (DDoS) attacks. When a single server gets overloaded by a DDoS attack, load balancers help by rerouting traffic to other servers and reducing the attack surface. This way, load balancing eliminates single points of failure and makes your network resilient against such attacks.

Load balancing as a technology has evolved over time and continues to be an important way for businesses to provide uninterrupted user experiences. Beyond the benefits for IT teams reduced downtimes and high availability, load balancing provides businesses new opportunities to scale their resources, meet customer demands and generate revenue.

More To Explore

WSDM
Web Performance

How Does a CDN Work?

The most significant factor driving the growth of the content delivery network market is end user interaction with online content. This interaction between a user and online content is far more complex today than it

Read More »