Thermal aware task assignment for multicore processors using genetic algorithm

ABSTRACT


INTRODUCTION
Nowadays, the difficulties with thermal management in modern multi-core central processing units (CPUs) become increasingly significant and essential hence, one of the difficulties faced by electrical and computer designers is the thermal management of integrated circuits (ICs), as IC performance and reliability are impacted by temperature. According to studies in [1] the IC lifetime can be reduced by 50% for every 10 C to 15 C increase in the IC peak temperature hence, many research and articles are focusing on this issue. One of the methods of controlling chip temperature is consisting of active cooling integration (e.g., fan cooling, water circulation, and oil cooling and heat pipe). This technique is not always suitable for embedded systems especially those with limited size and battery. Another method of chip thermal management is a program management approach in which the temperature of the central processing unit (CPU) can be balanced by assignment of tasks to the CPU cores according to the cores' thermal spot. This method doesn't have a limitation of the method as mentioned above.
Many polished articles studied the reduction of the peak/overall core temperatures by program management such as Rubio-Anguiano et al. [2] proposed a scheduler consisting of two stages: offline stage and online stage. In the offline stage, a minimum clock frequency is calculated that fulfills the deadline and partitioning scheme. And in the online stage, a fixed-priority zero laxity policy is applied as a global task allocation. The online stage scheduler in which accepts or rejects soft real-time aperiodic tasks selecting the upper lowest available frequency to minimization on power consumption while meeting time and thermal constraints. Also, Rodriguez and Yomsi [3] proposed a framework that captures both the temporal and thermal behavior of the system after scheduling tasks by one of the fixed priority algorithms like Deadline Monotonic orate Monotonic.
Many works tried to control CPU temperature reduction by using bio-inspired algorithms as such, Rupanetti and Salamy [4] proposed a novel strategy for task migration in multiprocessors by combining modified ant colony algorithm (ACO) and first-fit task allocation heuristic. The ACO algorithm tries to split tasks and migrate them to processors with low task utilization to minimize the overall power consumption of the multiprocessor system on a chip (MPSoC). Then tasks are scheduled by the earliest deadline first algorithm. In the same way, many other works use bio-inspired algorithms for controlling CPU thermal as it is done in [5], [6]. Regarding CPU frequency, Singh and Thangaraju [7] in their article labeled running processes or jobs as cold and hot processes. And they introduced a hybrid frequency scaling governor to reduce the overall power of the platform. This was caused by temperature reduction of the cores from 85 C to 47 C according to their experimental results. Li et al. [8] tried to minimize both the temperature and energy consumption of heterogeneous MPSoC by proposing two phases of task scheduling. First phase tasks are assigned to the processor while thermal and power dissipation factors are taken into account. And the second phase deduces the thermal/energy optimal speed assignment for tasks by considering the heterogeneity of both processors and tasks. For controlling peak temperatures, Jayaseelan and Mitra [9] proposed two techniques to control the peak temperature of the chip. First, they analyzed the peak temperature of the repeating task sequence and develop an optimal sequence of the tasks to minimize the peak temperature. The second proposed technique is the iterative algorithm that combines task sequencing with voltage scaling to further lower the peak temperature while satisfying the timing constraints. Some researchers are attempting of reducing the temperature by utilizing the dynamic voltage frequency (DVFS) technique frequency as [10]- [13] used DVFS technique to reduce power consumption by lowering chip voltage and frequency accordingly as a result the CPU temperature is reduced. And Durand and Lesecq [14] analyzed the nonlinearity between power and temperature. And they used a technique that implements a chopped scheme on top of a robust DVFS approach in order to prevent increasing temperature. Unfortunately, the DVFS technique has a limited impact on temperature and causes CPU performance reduction, as the execution time increases by lowering frequency [15]. Another way of reduction of CPU temperature is also can be performed by using adaptive supply and body voltage control technique as Sulaiman et al. [16] that they used particle swarm algorithm (PSO) combined with the pareto front (PF) to determine the optimal solution of threshold and supply voltage ( ℎ − ) that caused by thermal reduction rages from 8 C to 12 C for each body bias strep voltage. Also, they used adaptive supply and body voltage control in [17] to compensate the threshold voltage and clock frequency for ultra-low power design by supplying optimal and NMOS-PMOS body bias voltages ( − & − ) to the microprocessor unit and results in a power saving up to 20% and thermal reduction in a range of 8 C for each body bias step voltage. CPU temperature also can be reduced by CPU floor planning as it is done by Xie and Hung [18] utilized a floor planning technique for peak temperature reduction and thermal balancing on the CPU.
This article presents the analysis of the proposed task assignment algorithm using a natural inspiration genetic algorithm for thermal balancing in multi-core processors by taking the temperature of each core and then assigning tasks depending on the current cores' temperature and the expected energy consumption of the tasks using the proposed assignment algorithm. The algorithm analyzes the behavior of different core temperature states while considering the energy expended by each task that the core performs including parallel application tasks. The results of the performance achieved with its application in different simulation environments are analyzed and compared to thermal unbalanced approaches. The proposed approach collects information about the real hotspot of all cores besides the task mapped and executed by each core on each processor to decide the next task mapped and assigned for each core. This approach decides to dynamically move tasks between light and heavy types so that the temperature difference between the cores reduces. The results validate the effectiveness of the proposed algorithm in managing the hotspot and reducing both temperature and energy consumption in multicore processors in high-performance computer systems.
The rest of the paper is organized as: section 2 explains the theoretical background and relation between power and temperature by algebraic equations. In section 3 system models are explained which consist of power and task modeling. Section 4 talks about the assignment algorithm in detail by expressing the pseudo-code at the end of the section. the simulation and results of two cases (equal cores' initial temperature and different cores' initial temperature) are illustrated and discussed in section 5. Finally, the conclusions are expressed in section 6.

THEORETICAL BACKGROUND
Overall, this paper introduces the use of thermal RC-modeling as an accurate efficient modeling system for temperature estimation in multi-core processors considering is the average power of a task with an execution time , the temperature rise on each core caused by task ( ) is that can be expressed as (1) [15], The temperature rise according to the next task can be stated as (2) [15], where: is the average power of Task ( ). is the execution time of Task( ) is ambient temperature. is the initial temperature. is thermal resistance. is thermal capacitance.
From the above equations, we can conclude that four factors have an impact on the core's transient temperature: average power of the executed task on the core, initial temperature, execution time, and the number of assigned tasks on the core. Thus, if we execute more tasks on a core, the temperature rise of the core will be the sum of the power and execution time of each task.
If the periodic tasks are mapped as ( 1 , 2 , 3 , … ) to be executed by a multicore CPU ( 1 , 2 , 3 , … ). the problem is how to assign Tasks in such a way that keeps the temperature of the cores balanced. This can be done by performing an assignment algorithm which takes the cores' temperature and task parameters as input, and implementation of the assignment algorithm goal is the core with the highest temperature take over a set of tasks which their energy summation is less, compared to the energy sum of other cores' assigned task set by a factor which's covered in the next sections. And the same way for the core with the second-highest temperature. The algorithm will continue until the assignment of the task-set for the coldest core.

THE PROPOSED SYSTEM MODEL 3.1. Power model
In complementary metal-oxide-semiconductor (CMOS) ICs which are building block of modern processors, power dissipation can be classified generally into two aspects; dynamic power and static power dissipation. The dynamic power dissipation is caused by the charging and discharging of the transistor's junction capacitors and the short interval short circuit during toggling between P-MOS and N-MOS. However, the static power is dissipated because of leakage current through reverse biased junctions of the transistor [19]. The dynamic power dissipation (Pd) can be expressed (3) [20], where: Vdd is the supply voltage, Cef is the effective capacitance and f is the clock frequency. And the frequency can be represented by (4) [21], where is threshold voltage and is hardware constant. The leakage power dissipation is expressed (5) [21], where is the leakage current which consists of gate-oxide leakage current and sub-threshold leakage current Gate-oxide leakage current and they expressed as (6)  where and are hardware parameters, is the gate width and is the oxide thickness. Sub-threshold leakage current is calculated by (7) [22], where and are hardware parameters. ℎ is the voltage related to the current chip temperature.

Task classification and modeling
According to [23] tasks can be classified into three classes. Depending on the application, we decide to use the class of the task. For instance, if we need to monitor a process by a specified sensor, then we should use a periodic task, however, if monitoring of that process is obliged to be within a specified time, then this task should be a real-time class. Task classes are described stated in the followings: − Independent vs dependent tasks: When one activity's completion depends on that of another work or task, that task is said to be dependent. The majority of general-purpose programs operate on the dependent task approach. Directed acyclic graph (DAG) is used to depict the dependency between tasks (GAG). Tasks are represented in a DAG by nodes, while interdependence between tasks is represented by edges. − Real time vs non-real time tasks: When the CUP is required by the operating system to complete a task before the deadline expires, the task is said to be in real time. Real time tasks have two subclasses, the first of which is hard real time tasks, wherein the deadline for completing the task must be met. The processor is permitted to finish the task execution by some intervals that depend on the application and the level of application seriousness in the second subclass of real time tasks, known as soft real time tasks. − Periodic vs aperiodic tasks: A task that arrives in fixed time intervals is called a periodic task. The instant of the first activation is called phase ϕ. For a periodic task τ, its activation time for K th instant can easily be expressed by ∅ + ( − 1 ) where T is period. And in real time systems the task period is assigned as the task's deadline in most of the cases. Aperiodic tasks, on the other hand, have an indefinite series of activity and arise erratically, making it difficult and inaccurate to estimate when they will appear. This study considers independent non-real time periodic tasks which are modeled as follows: ) where: is the execution time of . is the average power consumption of . energy dissipated by . The task period depends on the application of the task. However, the other parameters of the tasks depend on the CPU configuration and they will be determined through the simulation setup. One of the parameters is and could be accurately measured in selected CPU configuration by GEM5 [24], is could be measured using multicore power, area, and timing (McPAT) [25] with the same CPU configuration that was used for GEM5, and finally could be achieved by multiplying and together to determine the energy of each Task as explained by the block diagram in Figure 1.

THE PROPOSED TASK ASSIGNMENT ALGORITHM
Let's consider a set of independent, heterogeneous (i.e., tasks with different parameters), and periodic tasks. The algorithm of the task assignment will assign the tasks for each hyper-period of the tasks based on the decision factors ( 1 , 2 , … ) which depend on the cores' temperature ( 1 , 2 , 3 , … 4 ) and the tasks' energy as illustrated in Figure 2. The factor ( )assigns the percent of the total energy that should be consumed by each core. Hence, the energy that should be consumed by each core to keep the temperature balance of the CPU is computed by multiplying factor by the total energy of the tasks as expressed in (8).
Where is the total energy that should be executed by during a hyper-period.
is the total number of the tasks is the energy of Task i The factor will be computed by the principle of load assignment according to the core's temperature. Such that if the cores' temperatures are equal, of all the cores will be equal and if there's any difference between them, the factor will be highest for the lowest core temperature to execute the lowest energy and vice versa for the heist core temperature. This load distribution can be performed utilization of a square curve as shown below in Figure 3. The factor can be found by substitution the temperature of each core into (12) and determining their corresponding Y-axis value as shown in Figure 3. where: Now, can easily be found by the following expression: After determination of the decision factors, we can perform a task assignment by genetic algorithm (Geno-type) [26] such that each core ( ) executes ( ) of total tasks energy given that (0 ≤ ≤ 1). This can be done by treating each gene in a chromosome as a task as shown in Figure 4. Gene 1 represents Task 1, Gene 2 represents task 2, and so on.  At the beginning of the algorithm, a number ( ) of chromosome vectors with a length equal to the tasks vector length ( ) will be generated randomly which's calle d initial population matrix with a size equal to ( * ). Then the generated genes of the chromosomes are changed with each iteration according to the genetic algorithm (expressed in algorithm-1) such that we achieve the best fitness which's expressed in (13). The chromosome vector of the best fitness is mapped to the tasks vector in such a way that the tasks in which corresponding genes are equal to "1" are assigned to the core during a hyper-period. This assures that the energy consumed by the core is equal to * . See Figure 5.
After that, the assigned tasks are extracted from the tasks vector and the remaining tasks will be a candidate to be assigned to the next core. This process will continue until all the tasks are assigned to all the CPU cores. But it is critical to mention that the assignment algorithm won't be applied on the last core even if we do not achieve exact ( − ≈ − * ) to guarantee the implementation of all tasks. The task assignment algorithm is expressed in algorithm 1.

Int J Elec & Comp Eng
ISSN: 2088-8708  Thermal aware task assignment for multicore processors using genetic algorithm (Mohammed Parwez) 5259 Figure 5. Illustration of using genetic algorithm in task assignment Combine Population and Offspring to perform ( + ) End Map the chromosome of the best fitness to the task set so that each "1" in the chromosome is an assigned task to Core (j) Extract the assigned tasks form the task set n=n-number of extracted tasks. End Assign the remaining tasks to the last core ( ).
Once all the tasks are assigned to the cores according to their temperature, they will be simulated by a hotspot simulator to get the temperature response of the processor. And the steady-state value yield in the output of the hotspot simulator will be an input to the assignment algorithm for the next hyper-period. Again, the algorithm performs task assignment depending on the latest cores' temperature of the previous period.

Simulation setup
The Architecture used in this paper is a Quad-Core O3 (out of order) Processor with an I-Cache and D-Cache capacity of 32KB. And a dedicated L2-Cache memory with 1MB the capacity of the RAM used in this architecture is 64 MB integrated with DDRx1 memory controller type and the clock frequency is set to 2 GHz. The complete architecture is shown in Figure 6.

Task benchmarks and results
For testing the efficiency of our algorithm, we have to use some standard test benchmarks. The benchmarks which are used for the simulation of our algorithm consist of Spec2006 [27], Mibench [28], and Mediabench [29]. The detail of each benchmark is expressed in Table 1.

Simulation results
To evaluate the assignment algorithm the simulation was executed for 10 hyper-periods. For each hyper-period, all the tasks are assigned and executed by the processor, and the final temperature of each hyperperiod will be the initial temperature of the next hyper-period. This will be repeated for 10 hyper-periods. The simulation is performed in two parts, first part is setting the initial temperature of cores to equal values to know the ability of the assignment algorithm to maintain the balance of the temperature between the cores. And the second part is performed by setting each core's initial temperature with different values to inspect if the assignment algorithm is capable of re-balance the temperature of the processor cores. Figure 7 shows the temperature trend of each core and the executed task by each core on the same time domain, and the initial temperature of the cores is set at 45 ℃ and the ambient temperature is set at 45 ℃. From the results which are shown in the following Figure 8. It can be seen that the temperature of each core is slightly equal to the temperature of the other cores. This shows that the algorithm is successful in maintaining the temperature balance of the processor cores in such a way that the maximum difference in temperature between cores is nearly 9 ℃ for each hyper period. See Figure 8.

Different initial temperature
In the same previous way, we performed the simulation of our architecture with the same algorithm and test benchmarks. The only thing that has been changed, is the processor core's initial temperature. Which sat with different initial temperatures as follows: According to the results shown in Figure 9. The temperature gets balanced after a few seconds of task execution. This proves that the algorithm can balance processor cores' temperatures even if their temperatures are different as it can be seen in Figure 10 shows the maximum temperature difference between each core for each hyper period is slightly 8 ℃ except for the first hyper-period as the cores' initial temperatures are different. Figure 9. Cores temperature trend for non-equal initial temperature Figure 10. Maximum temperature difference between each core for 10 hyper-periods for the case of nonequal initial temperature

CONCLUSION
In this paper, an assignment algorithm has been proposed to keep the balance of temperature between the cores. The simulation is performed on a Quad-Core platform with two levels of cache and L2-cache dedicated. The mentioned task benchmarks are simulated using GEM5 to measure the task execution time on the platform. The McPAT simulator is used to measure the power of each task by exploiting the GEM5 statistics outputs. The energy of each task is measured through both GEM5 and McPAT outputs. Once the parameters of each task are achieved, they are employed and assigned to the cores according to the cores' current Temperature and energy of each task. The algorithm assigns the most energy to the lowest temperature core and the least energy to the core with the highest temperature. The algorithm uses genetic optimization for assignment of the tasks. The simulation results showed that the highest temperature difference between the Int J Elec & Comp Eng ISSN: 2088-8708  Thermal aware task assignment for multicore processors using genetic algorithm (Mohammed Parwez) 5263 cores is 8 ℃ for approximately 14 seconds. These results validate the effectiveness of the proposed task assignment algorithm in managing the hotspot and reducing both temperature and energy consumption in multicore processors. From Figures 7 and 9 it is noticeable from the results that there's a large peak takes place during each hyper-period of the cores this can be eliminated by using a proper task portioning algorithm after assignment of the tasks. Hence our future is to integrate task partitioning algorithm with our proposed assignment algorithm, to reduce the peak temperature of the cores.