Load-balancing

From Simulace.info
Revision as of 21:57, 17 January 2016 by Xtomp36 (talk | contribs) (Processes)
Jump to: navigation, search
  • Project name: Load-balancing
  • Class: 4IT496 (WS 2015/2016)
  • Author: Bc. Patrik Tomášek
  • Model type: Discrete-event simulation
  • Software used: SimProcess, trial version

Problem definition

A hosting company with its own infrastructure is using so called "load balancing" to distribute the overall load between multiple servers (hardware nodes) and “high-availability” to minimize service down-time.

Load-balancing
In the case of web hosting service it means dividing the incoming request between multiple devices (called nodes) based on a set of rules (priority, weight, etc). The simulation is based on nginx (a webserver software) load balancing.
Nginx load balancing.png
Note: In the picture the load-balacer isn't redundant, therefore HA isn't enabled. The simulation has two load-balances present.

High-avaibility
Ensures that a system or component is operational for desirable time. The solution necessary to provide web hosting consists of many parts, where all of them need to be on-line for the whole to be operational. To enable HA a provider can use failover and backups. Failover is basically a backup piece of hardware, which ensures that when a component goes off-line another takes it's place. After that it's necessary to load the backup on the component that took over. If there is a SAN (storage area network) implemented than there is no need to load a backup, because failover just uses the same data from one central storage, which is used for all the server nodes. In the simulation a SAN is implemented and failover is taken into consideration.

Anti-DoS/DDoS
Denial-of-service (DoS) attack is an incident is witch the targeted service goes down. Distributed denial-of-service means, that more than one system is used to attack a single target. There are more means of possible protection against such attack. Setting up a decent firewall rules might be a good place to start, but it isn't so effective as implementing a device, which can mitigate the attack. A Radware defencePro device is implemented in the simulation.

The goal of the simulation is to find the optimal number of server and other components necessary to enable the mentioned functions (LB, HA, Anti-DoS).

Method

SIMPROCESS

Model

The simulation is divided into 4 main processes:
- New requests - generation of new requests, further information in Entities section.
- DoS mitigation - implementation of anti-dos device.
- Load balancing - main process, witch distributes the generated (incoming) request between resources.
- Served request - simple dispose of the generated requests.
Whole model.jpg

The simulation is set to run in two 9 minutes iterations, using multiple schedules to apply the content changes of a web page (cached/un-cached content).
Generated request are the highest recorded numbers of the given hosting environment (so called peak). Other possibilities doesn't need to be taken into consideration because the system must be able to handle the peak and with owned hardware, there is not much use for downscaling. Besides the peak might come at odd hour, and starting up an off-line server takes a lot of time. It would be better to use the unallocated server usage for some other calculation, which might be beneficial for the provider.
Time unit used: seconds
Number of all requests: 9000 per second is the maximum recorded in the given environment. The number was divided by 100 for the simulation purposes (so it's 90).
Number of replications: 2

Entities

This is a list of all defined and used entities within the simulation.

Requests
The request data are based on real data from an hosting environment, that hosts multiple Magento e-shops (a rather complex system, which uses a lot of hardware).
The incoming request are categorised because Magento uses caching and indexing. It takes less time to server a cached content than un-cached one. Further more there are multiple schedules used to generate the requests. If new data have been added to the e-shop, that they need to be cached first, witch means more large requests.

- Small request
Request of a cached content.

- Standard request
Request of a content on storage server, no need to accesses database.

- Large request
Request of a content on storage server and database.

D-DoS
A large number of requests aiming to cripple the system.

Resources

This is only a brief summary of resources used, further information follows in processes section of this page.
When it comes to resource price, I take only lease into consideration. Of course it is also possible to purchase the given component. Cloud computing is also not considered.

Load-balancer
An important component described earlier on this page.
These devices are quite expensive. In case of lease, the price can be 300 USD monthly for a sufficient component for this simulation requirements.

Web server
There are two types of this resource, each with different capabilities:
Type 1 - a faster server that goes for 150 USD monthly
Type 2 - a slower server that goes for 100 USD monthly

Storage server SAN
In the simulation only SAN is used as a name of this resource. The price of one SAN depends on the configuration (number of HDDs, etc.). A sufficient SAN can be leased for 220 USD monthly.

Database server
Database server is quite fast, so in the simulation only one is necessary, however to achieve HA even with DB server a second one should be implemented. The price of one DB server is about 200 USD monthly.

Radware DefensePro
This is an anti-dos component implemented in the simulation.
The cost of this hardware is extreme, therefore partial lease is used.
The price depends on the internet connection of the provider, is this case it goes around 50 USD monthly.

Processes

New requests

Request generate.jpg

In this process new requests are generated according to the following schedule:
Interval: 1 second

Entity type Most content cached Standard content New content added Sum of normal requests Addition DoS requests (red square)
New small request (green dot) Poi(45) Poi(25) Poi(13.5) Poi(90) Poi(150)
New standard request (orange dot) Poi(31.5) Poi(45) Poi(31.5) Poi(90) Poi(150)
New large request (black dot) Poi(13.5) Poi(20) Poi(45) Poi(90) Poi(150)


DoS mittigation

Dos mittigation.jpg

This is a simple process, which implements anti-dos device and ensures that the system is not effected by the attack.
The "Branch" activity simply lets forward only the correct request and the rest is disposed, thus the attack is mitigated.
Without this device implemented the system would get flooded as later illustrated in results section of this page.

Load balancing

Load balancing.jpg

This is the main process of the simulation. Firstly it is expected that about 1 % of incoming request are lost somewhere on the way, so that is what the "Route" activity does. The lost requests are simply disposed. Than the request arrives at Load-balancer, which divides them based on probability. It would be also possible to use least processing or something similar. The load-balancers are designed to handle extreme amount of requests and are very fast, therefore there is only as little delay as 1 ms in the simulation and event that is probably to high. Basically with the number of generated request in the simulation the LB can't be overloaded even in case of DoS attack. However it is a critical piece of hardware, because if it goes down, no request will reach the host, therefore it must be redundant to enable HA (so two LBs are required).
After this the request reach either Server of type 1 or type 2. Server type one is a delay of Nor(25,3) ms and server type 2 is a delay of Nor(40,3.0) ms, because it is slower. Upon the web server processes the request it is decided where the request goes next based on Entity.type. In case of a small request it goes straight to the next Branch "Response". The standard and large request are send to storage server. The storage server represents another delay an because the system uses SAN it is a common resource for all the web servers. The delay of SAN is defined as Nor(10.0,2.0) ms. Standard request is than send to "Response" activity by "Entity route 2" activity. Large requests are further more send to DB server (the same delay as in the case of storage server) by the same activity. After that they continue to "Response" activity. Branch activity "Response" serves the content to the client, but only 90 % of requests are correct, 7 % are incorrect and client gives up and 3 % are also incorrect but client send a new request.

Requests served
Served requests.jpg

A simple dispose of served requests.

Results

Conclusion

Code

Sources

http://searchnetworking.techtarget.com/definition/load-balancing
http://searchstorage.techtarget.com/definition/failover
http://searchsoftwarequality.techtarget.com/definition/denial-of-service
http://searchsecurity.techtarget.com/definition/distributed-denial-of-service-attack
http://cepa.io/devlog/secure-https-load-balancing-with-nginx