A Dynamic Load Balancing Algorithm for Distributing Mobile Codes in Multi-Applications and Multi-Hosts Environment
A Dynamic Load Balancing Algorithm for Distributing Mobile Codes in Multi-Applications and Multi-Hosts Environment
Nevin Vunka Jungum , Nawaz Mohamudally and Nimal Nissanke School of Innovative Technologies and Engineering, University of Technology Mauritius La Tour Koenig, Pointe-aux-Sables, Mauritius School of Computing, Information Systems and Mathematics, London South Bank University London, UK
Abstract
Code offloading refers to partitioning software and migrating the mobile codes to other computational entities for processing. Often when a large number of mobile codes need to be distributed to many heterogenous hosts, this can easily lead to poor system performance if one host gets too many mobile codes to process while others are almost idle. To resolve such situation, we proposed a proposed a load balancing algorithm to ensure fairness in the distribution of the mobile codes. The algorithm is based on the popular Weighted Least-Connections (WLC) scheduling algorithm while taking into consideration the dynamic recalculation of the hosts’ weights and system attributes such as CPU idle rate and memory idle rate which the WLC algorithm does not take into consideration. Using simulation, various number of mobile codes were distributed to the hosts/servers and the proposed algorithm outperform existing Least-Connections and Weighted Least-Connections scheduling algorithms thus improving system efficiency.
Keywords: software partitioning, mobile codes, dynamic load balancing, scheduling algorithm
1. Introduction
Most of existing research [1][2][3] focuses on software partitioning for offloading of one application, running on a smartphone for example, to another device that could be another smartphone, desktop computer or server, that is a one-to-one case. However, in future real-life scenarios as computational resources gets pervasive in our physical environment, we would be faced with perhaps several applications being offloaded from the users’ devices to multiple participating computational devices or nodes, hence a many-to-many situation. In such environment, if the overall distributed system, comprising of all hosts/participating devices, cannot handle efficiently a huge amount of mobile codes being offloaded to multiple devices, then there will be a big scalability issue; which would in turn makes the execution of the partitioned mobile codes response time increase and hence that of the overall system. As per the strategy developed in [4] to prioritize the list of hosts, which we will refer at times as participating devices or nodes, multiple devices will simultaneously use this scheme to decide which participating nodes will be prioritized. Thus, this implies dynamic change of participating nodes’ resources on arrival of mobile codes, which we will refer at times as ‘tasks’, and possible reallocation of tasks to other participating nodes. Hence, a load balancing mechanism seems relevant in such situation to help in managing the load of participating nodes. Load balancing is important in such mobile distributed system to enable quick execution of mobile codes offloaded and guarantee the optimal exploitation of computing resources made available by the participating nodes. The sum of the expected time to compute (ETC) is used to measure the load of a participating node [5][6]. The load imbalance is mostly caused due to variations in the arrival and service patterns. Thus, an offloaded mobile code may at times wait for processing on a participating node while other participating nodes are available and ready to be used [7]. The degree of load imbalance is measured by the load imbalance factor in the mobile distributed computing system. Whenever load balancing overhead is smaller than the load imbalance factor, a load balancing decision need to be made. Load balancing techniques attempt to ensure that all participating nodes executing the mobile codes does almost similar quantity of work. The load balancing mechanism has to use system resources such a way that resource utilization, execution time, network bandwidth and task scheduling overhead are optimized. Since there are different types of participating nodes, such as always powered powerful servers and smartphones, that is, heterogenous computational nodes, execution times of offloaded mobile codes will be different. Thus, mobile codes offloaded by the user’s devices are distributed among the participating nodes to ensure equal workload among the latter at any time. Mobile codes may also be migrated to another participating node if the execution is being delayed to ensure equal workload.
2. Load Balancing Scheduling Algorithms
Generally, load balancing algorithms are classified either as static or dynamic algorithms. Static load balancing methods like round-robin (RR) scheduling algorithm and weighted round-robin (WRR) scheduling algorithm are based on pre-defined strategies while not taking into consideration the real-time load state of the participating nodes. On the contrary, dynamic load balancing algorithms such as the Least-Connection (LC) algorithm and Weighted Least-Connection (WLC) algorithm while distributing the partitioned mobile codes, does check the dynamic load condition of the participating nodes. All mobile codes offloaded to the partitioning nodes are distributed to the latter having the least number of requests. And when multiple mobile codes are offloaded in a specific time period, the algorithms would decrease the load balance degree. Some commonly used load balancing algorithms are as follows [8]:
This algorithm assumes that the resources of all participating nodes are the same. Newly arrived tasks are assigned to participating nodes as per a rotation order. It is a straight-forward technique but does not take into account the participating nodes different computational capabilities.
This algorithm considers using weights assigned to the participating nodes to designate their computational capabilities. For example, a fixed powered desktop computer might have a weight of 5 compared to a resource limited smartphone which would be given the weight of 1. That is, the former is five times powerful (in terms of processing speed, memory, networking, storage and so on) than the latter. Task distribution is proportionate to their respective weight to ensure participating nodes with higher computational capabilities get more tasks to execute.
Assumption is made that all participating nodes computational capabilities are identical and thus allocate task to the former having the least number of connections. But, in a mobile pervasive environment comprising of heterogeneous participating nodes, this approach would not result in the ideal task distribution.
In this algorithm [9], a weight is attached to each participating node based on their respective computational capabilities. The load of a participating node, hence, server, is defined by the number of connections to it. Each time a new task is offloaded, a ratio of each server’s actual connections and weight are computed and then the task is allocated to the server having the least ratio. The algorithm is relevant in scenarios where the computational capabilities of the servers are different, for example, the participating nodes are smartphones, laptops, desktop computers and some powerful servers. Let us assume the participating node/server
𝑆 (cid:3404)(cid:4668)𝑆 (cid:2868) , 𝑆 (cid:2869) , … , 𝑆 (cid:3041)(cid:2879)(cid:2869) (cid:4669) , and 𝑊(cid:4666)𝑆 (cid:3036) (cid:4667) denotes the weight of server 𝑆 (cid:3036) having as default value of 1. 𝐶(cid:4666)𝑆 (cid:3036) (cid:4667) denotes the number of tasks/connections that are currently being serviced by the server 𝑆 (cid:3036) . 𝐶 (cid:3046)(cid:3048)(cid:3040) (cid:3404) ∑𝐶(cid:4666)𝑆 (cid:3036) (cid:4667) , where 𝑖 (cid:3404) (cid:4668)
0, 1, … , 𝑛 (cid:3398) (cid:4669) denotes the totality of all tasks that are actually being serviced to all participating nodes. A freshly offloaded task will be allocated to the participating node 𝑆 (cid:3040) w.r.t. this condition: (cid:4676) (cid:3252)(cid:4666)(cid:3268)(cid:3288)(cid:4667)(cid:3252)(cid:3294)(cid:3296)(cid:3288) (cid:4677)(cid:3024)(cid:4666)(cid:3020) (cid:3288) (cid:4667) (cid:3404) min (cid:3437) (cid:3252)(cid:3435)(cid:3268)(cid:3284)(cid:3439)(cid:3252)(cid:3294)(cid:3296)(cid:3288) (cid:3024)(cid:4666)(cid:3020) (cid:3284) (cid:4667) (cid:3441) , where 𝑊(cid:4666)𝑆 (cid:3036) (cid:4667) (cid:3408) In one round, as 𝐶 (cid:3046)(cid:3048)(cid:3040) is a constant, therefore, the condition can be further reduced to: (cid:3004)(cid:4666)(cid:3020) (cid:3288) (cid:4667)(cid:3024)(cid:4666)(cid:3020) (cid:3288) (cid:4667) (cid:3404) min (cid:4676) (cid:3004)(cid:4666)(cid:3020) (cid:3284) (cid:4667)(cid:3024)(cid:4666)(cid:3020) (cid:3284) (cid:4667) (cid:4677) , where 𝑖 (cid:3404) (cid:4668)
0, 1, … , 𝑛 (cid:3398) (cid:4669) and 𝑊(cid:4666)𝑆 (cid:3036) (cid:4667) (cid:3408) Since division operation results in more computation overhead compared to multiplication, the condition can be expressed from (cid:3004)(cid:4666)(cid:3020) (cid:3288) (cid:4667)(cid:3024)(cid:4666)(cid:3020) (cid:3288) (cid:4667) (cid:3408) (cid:3004)(cid:4666)(cid:3020) (cid:3284) (cid:4667)(cid:3024)(cid:4666)(cid:3020) (cid:3284) (cid:4667) to 𝐶(cid:4666)𝑆 (cid:3040) (cid:4667) ∗ 𝑊(cid:4666)𝑆 (cid:3036) (cid:4667) (cid:3408)𝐶(cid:4666)𝑆 (cid:3036) (cid:4667) ∗ 𝑊(cid:4666)𝑆 (cid:3040) (cid:4667) where
𝑊(cid:4666)𝑆 (cid:3036) (cid:4667) (cid:3408) . As per the algorithm, any participating node available to host mobile codes must have a weight superior than zero. Below is a description of the algorithm. Algorithm
WLC Scheduling Algorithm
FOR (m=0; m NULL; As shown earlier, the WLC algorithm leverages the computational capabilities of each participating node using their respective weight. Thus, a much optimal load balancing degree is achieved compared to the LC algorithm. However, the following issues are still unaddressed: (1) The weight is predefined and preset well before by the server/system administrator. It is not dynamically recalculated and set to reflect current real-time situations as mobile codes are offloaded to the participating nodes for processing. Thus, in the long run, some participating nodes with higher weights might be overloaded while others are almost idle or processing fewer tasks. This does result in an imbalance of the system hence decreasing the overall system performance. (2) Using only the number of connections to a server to determinate its load does not necessarily reflect the actual situation. For instance, server A has two tasks dealing with some sort of video analysis whereas server B has three tasks all dealing with encryption of some texts in a plain document. Clearly server A is consuming much more resources in terms of processing power, memory, storage and bandwidth compared to server B. But the WLC algorithm fails to identify such scenario since it considers only the number of connections to a server. 3. An Adaptive Weighted Least Connection (AWLC) Scheduling Algorithm Improving the algorithm presented in section 4.3.1 will definitely help to achieve a much optimal load balancing degree. Hence the following strategy is used: (1) Collecting real-time information to use to calculate dynamically weight of each server will result to a more approximate evaluation of the time processing capacity of the former. To simplify the process by avoiding as much computation overheads, only the CPU idle rate and memory idle rate will be considered. Other features such as number of CPU, types of CPU, network bandwidth, hard drive or SSD speed, system architecture and so on will not be taken into consideration. Before a task is allocated to a server for execution, the current CPU idle rate and memory idle rate will be collected for each server and thus the weight of the latter will be calculated. (2) Based on their complexity, weights are assigned to tasks. In this work, a four categories approach is adopted for simplicity. A complex task is assigned a higher weight. The total weight of all tasks is the real time load of the server. Whenever a task needs to be processed, the real time load of each server will be calculated before task assignment. (3) Whenever a task needs to be offloaded, the ratio of each participating node’s real time load and weight are calculated; and allocates the task to the server having the minimum ratio to ensure load balancing of the system among the heterogeneous servers. Consider a group of servers 𝑆 (cid:3404) (cid:4668)𝑆 (cid:2868) , 𝑆 (cid:2869) , … , 𝑆 (cid:3041)(cid:2879)(cid:2869) (cid:4669) . CPU idle rate, memory idle rate and weight of server 𝑆 (cid:3036) are represented by 𝑉 (cid:3004) (cid:4666)𝑆 (cid:3036) (cid:4667) , 𝑉 (cid:3040) (cid:4666)𝑆 (cid:3036) (cid:4667) and 𝑊(cid:4666)𝑆 (cid:3036) (cid:4667) respectively. As the weight of a server is higher, this implies greater processing capabilities of the latter. Whenever a server goes offline, that is, fails, its weight is set to . The weight of a server 𝑆 (cid:3036) is computed as follows: 𝑊(cid:4666)𝑆 (cid:3036) (cid:4667) (cid:3404) 𝑘 (cid:2869) ∗ 𝑉 (cid:3030) (cid:4666)𝑆 (cid:3036) (cid:4667) (cid:3397) 𝑘 (cid:2870) ∗ 𝑉 (cid:3040) (cid:4666)𝑆 (cid:3036) (cid:4667) , where 𝑘 (cid:2869) (cid:3397) 𝑘 (cid:2870) (cid:3404) , 𝑉 (cid:3030) (cid:4666)𝑆 (cid:3036) (cid:4667) ∈ (cid:4666) (cid:4667) and 𝑉 (cid:3040) (cid:4666)𝑆 (cid:3036) (cid:4667) ∈ (cid:4666) (cid:4667) From the server weight equation, 𝑘 (cid:2869) and 𝑘 (cid:2870) denotes the level of importance assigned to the CPU idle rate and memory idle rate respectively. Assuming the memory idle rate is less important than the CPU idle rate, this implies 𝑘 (cid:2870) should be lesser than 𝑘 (cid:2869) . Thus, in this work, we use the ratio (cid:2871)(cid:2873) : (cid:2870)(cid:2873) for 𝑘 (cid:2869) : 𝑘 (cid:2870) . Depending on the context, this ratio can be changed as desired. We now have the server weight expressed as follows: 𝑊(cid:4666)𝑆 (cid:3036) (cid:4667) (cid:3404) (cid:2871)(cid:2873) ∗ 𝑉 (cid:3030) (cid:4666)𝑆 (cid:3036) (cid:4667) (cid:3397) (cid:2870)(cid:2873) ∗ 𝑉 (cid:3040) (cid:4666)𝑆 (cid:3036) (cid:4667) , where 𝑉 (cid:3030) (cid:4666)𝑆 (cid:3036) (cid:4667) ∈(cid:4666) (cid:4667) and 𝑉 (cid:3040) (cid:4666)𝑆 (cid:3036) (cid:4667) ∈ (cid:4666) (cid:4667) Let us assume we have four different types of tasks 𝑀 (cid:3404)(cid:4668)𝑀 𝑀 𝑀 𝑀 (cid:4669) , such that their respective weights are assigned 𝑃 (cid:3404) (cid:4668)𝑃 𝑃 𝑃 𝑃 (cid:4669) based on their level of complexity. Tasks with higher complexity gets larger weights to them. 𝐶(cid:4666)𝑆 (cid:3036) (cid:4667) denotes the number of connections presently connected to the server 𝑆 (cid:3036) . 𝐶 (cid:3036)(cid:3037) denotes the number of 𝑗 tasks server 𝑆 (cid:3036) is executing. 𝑀 denotes the new task ready to be allocated. The total weight of all tasks on a particular server 𝑆 (cid:3036) can be computed as (cid:3533) C (cid:2919)(cid:2920) ∗ 𝑃 (cid:3037)(cid:2872)(cid:3037)(cid:2880)(cid:2869) . The occurrence that the CPU and memory are completely loaded simultaneously is very low. Therefore, we assume that 𝑉 (cid:3030) (cid:4666)𝑆 (cid:3036) (cid:4667) , 𝑉 (cid:3040) (cid:4666)𝑆 (cid:3036) (cid:4667) : 𝑉 (cid:3030) (cid:4666)𝑆 (cid:3036) (cid:4667) , 𝑉 (cid:3040) (cid:4666)𝑆 (cid:3036) (cid:4667) ℝ (cid:3410) (cid:3411)(cid:4666)𝑉 (cid:3030) (cid:4666)𝑆 (cid:3036) (cid:4667) (cid:3404) 𝑉 (cid:3040) (cid:4666)𝑆 (cid:3036) (cid:4667) (cid:3404) (cid:4667) , that is, the CPU idle rate and memory idle rate cannot be simultaneously. Whenever a server goes down its weight is set to . For a participating node, a task with smaller weight represents small real time load, and a higher server weight represents a higher computational capacity. Thus, a newly offloaded mobile code will be allocated to the participating node that has the least ratio of the task weight and the server weight. In other words, the task will be allocated to say, server 𝑆 (cid:3040) , by satisfying the condition: (cid:3533) (cid:2887) (cid:3171)(cid:3168) ∗(cid:3017) (cid:3285)(cid:3120)(cid:3285)(cid:3128)(cid:3117) (cid:3024)(cid:4666)(cid:3020) (cid:3288) (cid:4667) (cid:3404) min (cid:4684) (cid:3533) (cid:2887) (cid:3167)(cid:3168) ∗(cid:3017) (cid:3285)(cid:3120)(cid:3285)(cid:3128)(cid:3117) (cid:3024)(cid:4666)(cid:3020) (cid:3284) (cid:4667) (cid:4685) , where 𝑖 (cid:3404)(cid:4668) 0, 1, … , 𝑛 (cid:3398) (cid:4669) The determination condition is: (cid:3533) (cid:2887) (cid:3167)(cid:3168) ∗(cid:3017) (cid:3285)(cid:3120)(cid:3285)(cid:3128)(cid:3117) (cid:3024)(cid:4666)(cid:3020) (cid:3284) (cid:4667) (cid:3407) , (cid:3533) (cid:2887) (cid:3171)(cid:3168) ∗(cid:3017) (cid:3285)(cid:3120)(cid:3285)(cid:3128)(cid:3117) (cid:3024)(cid:4666)(cid:3020) (cid:3288) (cid:4667) , where 𝑖 (cid:3404) (cid:4668) 0, 1, … , 𝑛 (cid:3398) (cid:4669) Since the division computation overhead is much bigger than multiplication and the weight of a server cannot be , so that condition is optimize to the following: (cid:4678)(cid:3533) C (cid:2919)(cid:2920) ∗ 𝑃 (cid:3037)(cid:2872)(cid:3037)(cid:2880)(cid:2869) (cid:4679) ∗ 𝑊(cid:4666)𝑆 (cid:3040) (cid:4667) (cid:3407) (cid:4678)(cid:3533) C (cid:2923)(cid:2920) ∗ 𝑃 (cid:3037)(cid:2872)(cid:3037)(cid:2880)(cid:2869) (cid:4679) ∗ 𝑊(cid:4666)𝑆 (cid:3036) (cid:4667) , where 𝑖 (cid:3404) (cid:4668) 0, 1, … , 𝑛 (cid:3398) (cid:4669) Also, the AWLC algorithm need to make sure the server is not scheduled when the latter’s weight is . The algorithm is as follows: Algorithm AWLC Scheduling Algorithm FOR (m=0; m NULL; Hence, the weight of server 𝑆 (cid:3036) is 𝑊(cid:4666)𝑆 (cid:3036) (cid:4667) (cid:3404) (cid:2871)(cid:2873) ∗ 𝑉 (cid:3030) (cid:4666)𝑆 (cid:3036) (cid:4667) (cid:3397) (cid:2870)(cid:2873) ∗𝑉 (cid:3040) (cid:4666)𝑆 (cid:3036) (cid:4667) and similarly, the server 𝑆 (cid:3040) has weight 𝑊(cid:4666)𝑆 (cid:3040) (cid:4667) (cid:3404) (cid:2871)(cid:2873) ∗ 𝑉 (cid:3030) (cid:4666)𝑆 (cid:3040) (cid:4667) (cid:3397) (cid:2870)(cid:2873) ∗ 𝑉 (cid:3040) (cid:4666)𝑆 (cid:3040) (cid:4667) . 4. Evaluation of the AWLC Scheduling Algorithm An open-source modeling and simulation of cloud computing infrastructure and services software, Cloudsim [10], is used to simulate the AWLC algorithm and its output is compared with that of the LC and WLC scheduling algorithms. The LC, WLC and AWLC scheduling algorithms are simulated in three scenarios with different number of tasks to process. The number of tasks is 150, 1500 and 15000 in each scenario and they are randomly generated with varying sizes. We added 15 participating nodes/servers in each scenario. The mean value represents the average amount of time all servers in the scenario takes to complete the task, hence it reflects how efficient the system is. The load balancing degree of the system is represented by the standard deviation. Scenario 1: 150 tasks The three algorithms were simulated based on 150 tasks that were randomly generated. Figure 1 below shows their respective performance. We can see that the load balancing degree of the AWLC scheduling algorithm is far better than the WLC and LC scheduling algorithms. Figure 2 compares the mean and standard deviation of the three algorithms. The AWLC scheduling algorithm seems to promise better efficiency compared to the LC and WLC scheduling algorithms. The standard deviation of the AWLC scheduling algorithm is the minimum indicating that the load balancing degree of the latter is superior than the WLC scheduling algorithm. The standard deviation of the LC scheduling algorithm is the highest suggesting a disparity in the allocation of tasks to the servers. Scenario 2: 1500 tasks In this scenario, the number of tasks increased to 1500. The three scheduling algorithms were simulated based on 1500 tasks that were randomly generated. Figure 3 shows their respective performances. The AWLC scheduling algorithm did better compare to the LC and WLC scheduling algorithms. The standard deviation of the AWLC scheduling algorithm is the minimum indicating that the load balancing of this algorithm is the best among three scheduling algorithms as shown in Figure 4. Scenario 3: 15000 tasks The number of tasks is considerably increased from 1500 to 15000 in this third scenario and Figure 5 shows the three scheduling algorithms performances. In the case of receiving 15000 randomly generated tasks of diverse weights, servers using the AWLC scheduling algorithm for task allocation has the best load balancing degree. In contrast, the load balancing degree of the LC scheduling algorithm is the poorest. Figure 6 shows a comparison of the mean and standard deviation of the three scheduling algorithms. We can see that the load balancing degree of the AWLC scheduling algorithm is far better than the WLC and LC scheduling algorithms. The high standard deviation of the LC scheduling algorithm as in the previous two scenarios clearly indicates a disparity in the allocation of tasks to the servers. And it is also clear that for system efficiency, the AWLC scheduling algorithm does better in terms of performance. For all three scenarios consisting of 150, 1500 and 15000 tasks, the AWLC scheduling algorithm shown producing the best performance. Compared to the WLC and LC scheduling algorithms, the load balancing degree and efficiency of the system using AWLC scheduling algorithm improved considerably. Fig. 1 Performance of the three scheduling algorithms for processing 150 tasks Fig. 2 Mean and standard deviation of the three scheduling algorithms for processing 150 tasks Fig. 3 Performance of the three scheduling algorithms for processing 1500 tasks Fig. 4 Mean and standard deviation of the three scheduling algorithms for processing 1500 tasks Fig. 5 Performance of the three scheduling algorithms for processing 15000 tasks Fig. 6 Mean and standard deviation of the three scheduling algorithms for processing 15000 tasks 4. Conclusion A load balancing algorithm is proposed to cope with the disparity in the resource utilization among participating devices to host mobile codes. As such an algorithm for load balancing is proposed. The algorithm is based on the Weighted Least-Connections scheduling algorithm while taking into consideration the dynamic recalculation of weights and host attributes such as CPU idle rate and memory idle rate. Using simulation, various number of tasks were distributed to the hosts/servers and the proposed algorithm outperform existing Least-Connections and Weighted Least-Connections scheduling algorithms thus improving system efficiency. Acknowledgments We grateful to our colleague Dr George Collymore {[email protected]} for his contribution in conducting the simulation of the proposed algorithm using the Cloudsim [10] open-source software. References Some Additional Data For 150 Tasks LC WLC AWLC Server Time Time Time Mean Standard Deviation For 1500 Tasks LC WLC AWLC Server Time Time Time Mean Standard Deviation For 15000 Tasks LC WLC AWLC Server Time Time Time Mean Standard Deviation77.56