Hybrid load balance based on genetic algorithm in cloud environment

Received Aug 9, 2020 Revised Dec 18, 2020 Accepted Dec 29, 2020 Load balancing is an efficient mechanism to distribute loads over cloud resources in a way that maximizes resource utilization and minimizes response time. Metaheuristic techniques are powerful techniques for solving the load balancing problems. However, these techniques suffer from efficiency degradation in large scale problems. This paper proposes three main contributions to solve this load balancing problem. First, it proposes a heterogeneous initialized load balancing (HILB) algorithm to perform a good task scheduling process that improves the makespan in the case of homogeneous or heterogeneous resources and provides a direction to reach optimal load deviation. Second, it proposes a hybrid load balance based on genetic algorithm (HLBGA) as a combination of HILB and genetic algorithm (GA). Third, a newly formulated fitness function that minimizes the load deviation is used for GA. The simulation of the proposed algorithm is implemented in the cases of homogeneous and heterogeneous resources in cloud resources. The simulation results show that the proposed hybrid algorithm outperforms other competitor algorithms in terms of makespan, resource utilization, and load deviation.


INTRODUCTION
Cloud computing technology provides a lot of services to all users over the internet using very large scalable and virtualized resources. The main objective of the cloud is to provide services all over the world with minimum cost and high performance [1,2]. To have the ability to allow all huge number of clients all over the world to share cloud resources and provide them with high-quality service in a reasonable time, all client's requests should be handled in an efficient way that don't waste time and resources. For that reason, there is a big need for load balancing techniques which are the master key for the success of any cloud services provider. Load balancing tries to keep cloud nodes equally loaded to avoid a situation where some of the resources are overloaded while some others are under loaded which as a result reduce the response time of the assigned tasks [3][4][5]. Load balancing is an efficient technique used to distribute workloads over resources in a way that improve resource utilization and response time. Load balancing tries to keep cloud resources equally loaded and avoid resources becoming over-loaded or under-loaded [6].
Traditional algorithms [7][8][9][10][11] are used to solve this problem. However, these algorithms have limitations in the case of complex and large scale problems. Metaheuristic algorithms such as particle swarm  [12], ant colony optimization (ACO) [13], artificial bee colony (ABC) [14], and genetic algorithm (GA) [15,16] are popular to solve non-deterministic polynomial-time (NP) complete problems. The convergence process and speed of metaheuristic algorithms with a complete random population become worse when increasing the number of jobs that make the problem more complex. Using an efficient scheduling algorithm that produces good initial solution to the initial population of metaheuristic algorithm makes use of the computational power of this metaheuristic algorithm and overcomes their drawbacks with complicated random initialized problems [17,18].
Genetic algorithm (GA) as an evolutionary algorithm became a very popular algorithm due to its accuracy in solving complicated non-linear problems. GA has been successfully applied to many non-linear and non-smooth types of optimization challenges such as query optimization [19], medical science [20], agriculture [21], management [22], feature selection [23], power flow management [24], and sensor networks [25]. GA is basically designed for the discrete optimization problem where bits of 0's and 1's are used to encode discrete design variables. Unlike bio-inspired algorithms that are designed for continuous problems and can choose any value to encode design variables, which makes GA more suitable than other algorithms in the load balancing problem. Choosing good initial population of GA is an important step to generate new better generations with high-quality solutions within less time [26].
In this paper, a hybrid load balance based on genetic algorithm (HLBGA) is proposed to distribute the loads overall virtual machines (VMs) in an efficient way. HLBGA is implemented in two phases. In the first phase, the heterogonous initialized load balancing (HILB) algorithm is proposed. It distributes tasks overall VMs in an efficient way to avoid overloaded or under loaded VMs. In the second phase, GA is used to enhance the overall system performance. It is initialized with the output of the HILB algorithm as a good initial population for GA. This phase uses a newly formulated fitness function for GA that helps the HLBGA to reach the optimal load deviation.
The rest of this paper is organized as: Section 2 presents the related load balancing algorithms. In Section 3, the proposed load-balancing algorithm is introduced. In Section 4, the performance evaluation of the proposed algorithm is presented and compared with the existing load balancing algorithms. Section 5 presents the main conclusions and future work.

RELATED WORK
A large area of researches was introduced to solve the load balancing problem to get an optimal assignment solution. These researches can be categorized into three main types of algorithms: traditional, metaheuristic, and hybrid algorithms.

Traditional algorithms
Traditional algorithms are worked based on knowing information about resources and tasks to calculate their evaluation parameters. Most of them rely on execution time to assign tasks to resources in a way that minimizes makespan, load deviation, or both. Min-Min algorithm is a well-known algorithm in this category. Min-Min algorithm is the base of many scheduling algorithms [8]. In this algorithm, the completion time of all submitted tasks among all VMs is calculated. The task with minimum completion time is assigned to the corresponding VM. Then the completion time of all other tasks on that machine is updated by adding the completion time of the assigned task to their completion times. This task is removed from a list of unassigned tasks, and then this procedure is repeated until all tasks are assigned.
Load balance improved Min-Min (LBIMM) algorithm improves the standard Min-Min algorithm [9]. In the first step, the Min-Min algorithm is executed to give the initial solution to start the next step. In the next step, the completion time of the smallest size task from the heaviest loaded resource is calculated on all other VMs. Makespan is calculated in case that task is removed to the VM with the minimum completion time of that task and compared with the makespan produced by Min-Min. If it is less than the task, it is reassigned to the resource that produces it, and the ready time of both resources is updated. The process repeats until no other reassignments can produce less makespan. Thus the heavy load resources are freed and the light load or idle resources are more utilized. Although the traditional algorithms are simple to implement and can improve makespan, some of them don't take the load deviation in its consideration especially in case of big difference in resource speed. Also, they can't find the optimal solution especially when the problem becomes complex or too large [25].

Metaheuristic algorithms
Metaheuristic algorithms are the most powerful techniques for the optimization of complex nonlinear problems which is the case of most task scheduling and load balancing issues [26]. Metaheuristic algorithms can be classified into swarm intelligence based algorithms and evolutionary algorithms. Swarm intelligence based algorithms such as PSO, ACO, and ABC optimize a certain problem by simulating the collective behavior of natural swarms. Evolutionary algorithms such as GA are based on the evolutionary behavior of natural systems. PSO algorithm is one of the standard algorithms used in load balancing and also in other applications [27,28]. It is a swarm intelligent algorithm, inspired by nature for solving nonlinear optimization problems [10]. PSO is a simulation of the advantages of bird flocks. It starts with initial individuals called particles representing initial solutions for the problem. During the search process, killing of any individual is not permitted. In PSO, all individuals remain alive and try to make themselves stronger throughout the search process. In every generation/iteration, individuals make themselves better. The identity of the individual does not change over the iterations.
GA is an evolutionary optimization algorithm based on the biological concept of population generation [13]. A new population is evolved in every generation based on predefined fitness function. GA works better for vast and complex search space problems. It works based on three main operations which are selection, crossover, and mutation. The strength of GA is in the parallel nature of its search. The genetic operators used are the main powerful reason for the success of the search. Crossover is the main genetic operator, whereas mutation is used less frequently. Crossover attempts to benefit offspring solutions and to eliminate undesirable components. By restricting the reproduction of weak offsprings, GAs eliminates not only that solution but also all of its descendants. This makes the algorithm converge towards high-quality solutions within a few generations. In order to realize powerful crossover and mutation operators, we must choose good initial population for GA [14].
However metaheuristic algorithms are powerful techniques for optimization, they are inefficient to handle the load in cloud computing in case of random initial population. Also, they suffer from increasing the computational cost in the large scale problems [29]. Therefore, hybrid algorithms are introduced to enhance the performance of both the traditional and metaheuristic algorithms in order to handle their problems.

Hybrid algorithms
Hybrid task scheduling algorithms are based on combining two scheduling algorithms to make use of the advantage of both these two algorithms. This paper presents some of the most popular hybrid algorithms to state the reason for the proposed algorithm. HGA-ACO algorithm [30] combines GA and ACO algorithms together. Randomly initialized GA is used to produce the initial pheromone for ACO. ACO starts to iterate in order to give the best solution. The best two solutions from GA and ACO are merged by crossover to give the global best solution. However, the algorithm focuses on response time, execution time and throughput, it doesn't subject to the load balancing problem. GA is not an effective algorithm to give an initial solution when it is randomly initialized.
Osmotic hybrid artificial bee and ant colony (OH_BAC) algorithm is presented in [31]. It applies the osmosis technique for providing energy efficient cloud environment. In this algorithm, ABC and ACO cooperate to select the appropriate VM to be migrated to the most suitable physical machine. In addition, it makes activation for the most suitable osmotic host among all physical machines in the system to decrease power consumption.
Moreover, integrating machine learning techniques with load balancing algorithms reinforcement the learning process and help to improve the performance and the convergence rate of the load balancing process [32]. However, the goal of most of these algorithms is to minimize the overall completion time without looking into the minimization of the overall load deviation. Most of previous algorithms choose minimizing makespan as the main goal in scheduling; however this target always chooses faster VMs to perform the assigned tasks. This results in overloaded VMs with high processing speed that yields to starvation problem of other VMs with lower processing time. In addition, the experiments of most of related work are limited as they tested their algorithms on small scale problems [33]. In this paper, a new hybrid HLBGA balancing algorithm is proposed which combines GA and a new proposed HILB scheduling algorithm which helps genetics to converge more quickly to better solution by feeding it with good initial population.

THE PROPOSED HLBGA 3.1. Architecture overview
In this section, the proposed HLBGA is presented. The main purpose of the proposed algorithm is to improve the assignment performance for all the submitted tasks on all VMs. It tries to assign tasks to each VM based on its computing capabilities to make use of all of them which leads at the end to balance the load among all VMs. Load balance is an optimization problem in which load deviation is the objective function needed to be minimized. GA is one of the popular algorithms that are used to solve optimization problems. The proposed algorithm uses GA with a good initial population to get the optimal solution with less time. The proposed HLBGA is based on two main phases. The first phase is applying the proposed HILB algorithm that distributes tasks overall VMs based on each resource computing capabilities to ensure that no single VM is either overloaded or underutilized especially in case of major differences between resources computing capabilities. The second phase uses the output of the HILB algorithm as an initial population for the GA which optimizes load deviation objective function to achieve optimum load distribution.
The proposed HLBGA algorithm introduces a new objective function to improve the performance of the assignment problem even when the problem becomes complex or too large. It implemented in different environments, homogeneous, heterogeneous-low and heterogonous-high environments. HLBGA also is implemented on a different number of tasks. It improves resource utilization and it also decreases both the load deviation and the makespan.

Load balancing problem analysis
Although cloud computing is dynamic, at any particular instance the load balancing problem can be formulated as assigning a set of n tasks on a set of m VMs. Assume that the cloud task scheduler receives n independent tasks 1 2 3 … … . with different lengths, which are expressed in million instructions (MI) as (1): Also, assume that the cloud task scheduler contains information about the m VMs; with different processing speeds, which are expressed in million instructions per second (MIPS( as: where is the processor speed of VM and = {1. 2. … . } The assignment matrix of tasks over VMs can be represented as: where = 1 if task is assigned to VM , otherwise = 0 Assume also that at any time there will be load matrix X contains information about the current load of the m VMs 1 2 3 … … . . The VMs loads are defined in the load matrix as: = ∑ =1 where is the current load of VM and = {1. 2. … . } The performance of the assignment solution can be measured using makespan, load deviation ( ), and resource utilization (U). They can be calculated as [23]:

The problem formulation of HLBGA
The goal of the proposed HLBGA algorithm is to optimally assign a set of tasks on a set of VMs in a way that minimize the load deviation of all VMs. Minimizing load deviation yields to minimize makespan and maximize resource utilization since it assigns the tasks to all VMs with dif ferent The number of tasks m The number of VMs Tnx1 The task length matrix where ti is the length of i th task in MI Vmx1 The processor speed matrix where vj is processor speed of j th VM in MIPS Xmx1 The load Matrix for all VM where xj is load of j th VM σ 2 Load variance σ Load deviation μ Load Mean × Assignment matrix where θij is a binary bit equals to 1 or 0, which represents assignment state of task i on VM j The proposed model formulates the objective in terms of the assignment matrix. It tries to get the assignment matrix that provides the solution with minimum load deviation. The load variance can be obtained as: then because Substitute (13) in (10) where I is an identity matrix, All the diagonal elements of the Q matrix are −1 and its off-diagonal elements are −1 , so Q is an idempotent matrix [34]. The matrix Q is useful in computing sums of squared deviations.
By substituting (11) in (9) where = ∑ =1 (21) The objective function is concluded by substituting (21), (22), and (23) in (20) that yields (24). As shown in (25) is the nonlinear objective function of HLBGA where t, v, m, and n are constants for each problem which represent tasks length, VMs processor speed, number of VMs, and number of tasks need to be assigned, respectively. While θ contains the assignment variables need to be solved for the optimum solution.
This objective function is subject to three constrains which are formulated in (26)(27)(28). As shown in (26) means that each task should be assigned to only one VM. θ in (27) is a binary variable which can be 1 or 0, i.e., assigned or not assigned. As shown in (28) states that, the completion time for any VM for optimum solution should be less than or equal to the makespan of the initial assignment matrix (Makespaninitial).

The HLBGA phases
The proposed HLBGA algorithm has two phases. First, HILB algorithm is proposed as a new traditional algorithm in order to distribute tasks overall VMs in an efficient way to avoid overloaded or under loaded VMs. The second phase uses the output as an initial population for GA. Figure 1 shows the main steps of the two phases of the proposed algorithm. These two phases are implemented as:

Phase I: Initial population phase
In this phase, the HILB algorithm is proposed in order to balance the load and minimize makespan. Algorithm strategy is based on moving tasks from heavy loaded machines to least loaded ones as: HILB makes all the available task movements for the current heaviest loaded VM to any one of the remaining VMs. HILB repeats these previous operations on all the available resources. It balances the load overall resources even very slow ones in a way that achieves high load balancing and optimum completion time. This algorithm avoids starvation problem between VMs.

Phase II: GA phase
HLBGA algorithm relies on GA as a powerful solution for nonlinear programming optimization NP-complete problems. Genetics in this algorithm relies on three main operations; elite, crossover, and mutation. In Elite operation, the algorithm chooses the assignment matrices that give the best fitness functions to pass to the next generation. In crossover and mutation operations, the algorithm reassigns tasks to different VMs to form new solutions in different ways. Crossover recombines each two assignment matrices to form two new ones which practically mean reassignment of tasks to form two new solutions. The recombination must be done on a complete row basis i.e., complete rows are swapped between matrices. While in mutation, random changes done to a single assignment matrix. Algorithm 1 shows the main processes of the proposed HLBGA. Evaluate each chromosome using fitness function 20.

Algorithm 1: The proposed HLBGA
Choose (E × P) chromosomes with the best fitness function as elite for the next generation 21.
Select (C × P) chromosomes for crossover operation 22.
Select two random chromosomes as input for crossover operation 24.
Perform crossover operation on selected chromosomes 25.
Select the two output chromosomes to the next generation 26.
End For 27.
Select ( U × P ) chromosomes for mutation operation 28.
Select one random chromosome as input for mutation operation 30.
Perform Mutation process on the selected chromosome 31.
Select the output chromosome to the next generation 32.
End For 33.
Replace the current population by new generation 34. End

Complexity of HLBGA
The HLBGA is based on two main phases. In the first phase, it runs the HILB. The time complexity of this phase is based on the number of the movements that performed to reach the initial population. It can be computed as: O(n1). In the second phase, the HLBGA runs the GA. The complexity in this phase can be computed as O(G×N) [35]. Comparing the time complexity of the first phase to the second phase, it was found that n1<< G×N, so it can be neglected. Therefore, the total complexity of the HLBGA algorithm is: O(G×N). The initial population that is used in the proposed algorithm helps the genetics to reach a better solution with less population size and number of generations which decreases the complexity of the algorithm. Table 2 shows the time complexity of the HLBGA and a description of the complexity parameters.

PERFORMANCE EVALUATIONS
In this section, the performance of the proposed HLBGA algorithm is evaluated in different environments and conditions. The proposed algorithm is compared against variant techniques; Min-Min [8] and LBIMM [9] as traditional algorithms, PSO [10] with two different objective functions as metaheuristic techniques; PSO1 is the basic PSO algorithm where the objective function is to minimize the makespan while PSO2 is an updated version of the basic PSO algorithm where the objective function is to minimize the load deviation, and GA [13] as an evolutionary algorithm which is the original of the proposed algorithm. In addition, the comparison includes the proposed HILB that represents the initial population of HLBGA. The evaluation is based on the results of simulation done using CloudSim [35].

Simulation overview
CloudSim is a simulation tool that simulates the behavior of load balancing algorithms when run on real data centers. It was used to test the performance of the proposed algorithm and compare the results with the other algorithms in terms of makespan, resource utilization, and load standard deviation [25]. Table 2 shows the CloudSim configuration for the four simulations used to test the behavior of the proposed algorithm in different running conditions. Each simulation was run 10 5 times and the average was considered in the results. The parameters of GA and PSO are shown in Table 3.

Impact of increasing the workloads with fixed resources
In this case, the number of tasks is increased while the number of VMs is fixed to check the algorithm's behavior in different workloads on the same resources. The simulation parameters of Simulation 1 are shown in Table 3. The number of tasks is varying from 10 to 150. The tasks have different lengths as happen in realworld workloads. They were generated randomly at the range from 200 to 3000 (MI). Four VMs were considered for the simulation. The evaluation metrics are makespan, resource utilization, and load deviation. Figure 2 shows the makespan comparison of the proposed HLBGA with the intended algorithms. It is shown that HLBGA minimizes the makespan comparing with the other algorithms. The makespan improvement of HLBGA over HILB and GA is up to 15.7% and 71%; respectively. Figure 3 shows the load deviation comparison for Simulation 1. It can be seen that the load deviation of the proposed HLBGA is minimized when compared with the other algorithms. The load deviation improvement of HLBGA over HILB and GA is 28.5% and 96.1%, respectively in the case of 150 tasks. Figure 4 shows a resource utilization comparison for Simulation 1. It is shown that the resource utilization of the proposed HLBGA is maximized when compared with other algorithms. The increase in the utilization of the proposed HLBGA over HILB and GA is 1.8% and 67.4%, respectively in the case of 150 tasks. The results show that the performance of the metaheuristic algorithms such as PSO1, PSO2, and GA is much lower than the performance of the traditional algorithms at a large number of tasks. With increasing in the number of tasks, HILB introduces a good performance than the other traditional algorithms so it can be used to produce an initial population for GA to form the proposed HLBGA. The proposed HLBGA algorithm as a hybrid technique between HILB and GA outperforms the other algorithms. The makespan, load deviation, and utilization improvement of HLBGA over HILB and GA are 8% and 48.3%, 34.3% and 85%, and 3.4% and 40%, respectively.

Implementations in homogeneous and heterogeneous environments
In this case, the simulation is implemented on a fixed number of Cloudlets and VMs but the speed of VMs are changed to test the performance of the algorithms in Homogeneous (Homog), Heterogonous-high (Het-high) and Heterogeneous-low (Het-low) processors. The simulation parameters of Simulation 2 are shown in Table 3. Three simulations were run with different VM speed environments. In Homogenous, all the VMs have the same speed. In Heterogeneous-low, the speed variation among VMs is low with ratio 1:2.5 between lowest and highest speed VM while in Heterogeneous-high, simulation a high-speed variation among all VMs with ratio 1:7 is considered.
The target of this experiment is to test the proposed algorithm behavior in the case of workloads with different lengths in varying environments. Figure 5 shows a makespan comparison of the proposed HLBGA algorithm with the LBIMM, HILB, standard GA and PSO algorithms while the simulation environment varies from homogeneous to heterogeneous. It is shown that the makespan improvement of HLBGA over HILB and GA is up to 2.6% and 42.5%, respectively. Figure 6 shows a load deviation comparison of Simulation 3. It can be seen that the load deviation of the proposed algorithm is minimized when compared with the other algorithms. Figure 7 shows the utilization comparison of Simulation 3. It is clear that the utilization of the proposed algorithm is maximized when compared with the other algorithms. The results show that GA works better than the other metaheuristic algorithms, and also HILB is more powerful in load balancing than the other traditional algorithms. The proposed HLBGA algorithm performs better than the other algorithms in all cases especially in Heterogeneous-high which gives the best results compared to the other algorithms.

CONCLUSION AND FUTURE WORK
In this paper, HLBGA algorithm is proposed. It is implemented in two phases. In the first phase, HILB scheduling algorithm is proposed to perform a good task scheduling process in order to improve the makespan and produce a good initial population to the second phase. In the second phase, GA as an evolutionary-based algorithm is used with a newly formulated fitness function in the way of reaching the optimal load deviation. The proposed algorithm is tested on two simulations. The first simulation tests the effect of increasing the workloads on the same number of VMs. The simulation results show that the proposed HLBGA outperforms the other standard and metaheuristic algorithms; Min-Min, LBIMM, GA and PSO. The second simulation tests the algorithm behavior in the case of distributing tasks of different lengths on resources that have one of three cases: the same speed (Homogeneous), a slight difference in the speeds (Heterogeneous-low), and a large variation in the speeds (Heterogeneous-high). The simulation results show that the proposed HLBGA outperforms all the other algorithms especially in Heterogeneous-high case.
This study focuses on the processor speed of VMs since it is the most effective factor, while other factors such as memory size and bandwidth of VMs are constants. In future work, the performance of the proposed algorithm with more other conditions will be investigated. Also, integrating a machine learning technique with the proposed algorithm adds a new value and can be tested.