Multi-objective Pareto front and particle swarm optimization algorithms for power dissipation reduction in microprocessors

Received Mar 25, 2020 Revised May 31, 2020 Accepted Jun 16, 2020 The progress of microelectronics making possible higher integration densities, and a considerable development of on-board systems are currently undergoing, this growth comes up against a limiting factor of power dissipation. Higher power dissipation will cause an immediate spread of generated heat which causes thermal problems. Consequently, the system's total consumed energy will increase as the system temperature increase. High temperatures in microprocessors and large thermal energy of computer systems produce huge problems of system confidence, performance, and cooling expenses. Power consumed by processors are mainly due to the increase in number of cores and the clock frequency, which is dissipated in the form of heat and causes thermal challenges for chip designers. As the microprocessor’s performance has increased remarkably in Nano-meter technology, power dissipation is becoming non-negligible. To solve this problem, this article addresses power dissipation reduction issues for high performance processors using multi-objective Pareto front (PF), and particle swarm optimization (PSO) algorithms to achieve power dissipation as a prior computation that reduces the real delay of a target microprocessor unit. Simulation is verified the conceptual fundamentals and optimization of joint body and supply voltages (Vth-VDD) which showing satisfactory findings.


INTRODUCTION
The performance growth of digital systems, and massive development in Nano-meter technologies are coming because of the extensive introduction of electronics and portable devices into daily life. Today, we are considering complex chips comprising high level of power dissipation plus generated heat, and therefore need to be in line with the reduction in the dimensions of microelectronics and digital devices. Currently, commercial microprocessor circuits are available with CMOS transistors with a lateral size in Nano-meter process technology, such miniaturization has led to enormous power and temperature challenges with the presence of billions of transistors on the chip [1,2]. Higher power dissipation leads to increase chip temperature levels that compromising the life of microprocessors due to the addition of new features and performance. This trend is in opposition to an integration in microelectronics which tries to be as compact and as autonomous as possible for portable applications. This introduction of high performance and new features will therefore only be done with significant cooling technology or fundamental design changes [3].
Moreover, delay of the microprocessor's chip needs to meet the target performance and lowest power dissipation. Using multi-objective particle swarm optimization (PSO) is a very efficient approach for achieving Pareto front (PF) decisions for microprocessor's high-power dissipation problems [4].

6550
From microprocessors power consumption rigorous survey of related works and published studies, it is observed that, there have been numerous articles, and published studies on the design and optimization techniques. To this end, number of new works published recently has been pointed out.
Sheikh, et al., proposed a multi-objective mutative algorithm based on task scheduling approach for resolving Pareto optimal solutions (POS) with simultaneous optimization of temperature, performance, and energy. In addition, they present a methodology to choose a single solution from PF given the user's preference. The proposed algorithm for scheduling tasks achieves three-way optimization with fast turnaround time, and is advantageous because it reduces energy as well as temperatures rather than in isolation [5]. Vanapalli introduces PSO algorithm for VLSI systems to reduce leakage power dissipation based on leakage vector. In this paper, the genetic algorithm (GA) is presented briefly and also implemented to search for minimum leakage vector (MLV) and compared with PSO in terms of time delay and number of iterations, the proposed approach is simulated and veridfied on a number of circuits as a case study [6].
Attia, et al., analyzed the basic theories of multi-core, trending research areas for of multi-core microprocessors and then focused on energy management problem issues in multi-core architectures. Moreover, they discussed the different techniques for power management, and proposed a specific technique for power management in multi-core processors based on that survey [7]. Sulaiman, et al., proposed an optimal concurrent joint set of the supply and threshold voltage scaling (Vth-VDD) for minimizing power dissipation of modern high-speed digital systems. In order to validate minimum power dissipation, they tested variousVth-VDD sets based on PF and PSO algorithms. Their results verified on a high-performance processor for minimum power reduction levels, minimum temperature levels, and multiple workload conditions [8]. A. Kumar, and R. K. Nagaria proposed and developed domino gate as a new leakage tolerant highspeed which has higher noise immunity, lower power dissipation, and less process variations for wide fan-in OR logic gate. Furthermore, stacking of NMOS transistors is accomplished to reduce leakage power consumption and total current transfer in cascade fashion that can be operated in deep submicron process technology [9].
Singh et al., presented a 10T static random-access memory (SRAM) cell to improve leakage power dissipation with improved cell stability. The proposed SRAM cell is adopted to design a look up table for 6-inputs of FPGA and a 2kb SRAM macroblock. They achieved superior results in terms of write and read static noise margins; and less leakage power dissipation [10]. Y. Wang, et al. proposed hypotheses for the underlying causes and validate power variations in processors based upon specifically governed environmental factors. Their test findings indicate that, through increase of number of transistors, variance of temperature features becomes higher within processors, whereat has important involvement to the change in power dissipation for present processors [11]. E. Angel, et al. studied issue of scheduled set of jobs with time, timelines and handling process demands to reduce the total power dissipation on a parallel scalable processor. They derived the issue as a convex program and presented a combinatorial polynomial time approach which is based on finding maximum flows [12]. This paper presents PF and PSO optimization algorithms to ensure the efficient operation of dynamic voltage scaling (DVS) and Body Bias Voltage scaling (BBVS) for power dissipation minimization in microprocessors. The combined DVS and BBVS scaling technique is dynamically altering processor's throughput for energy-efficiency. Among, number of parameters are considered for possible power dissipation improvement by scaling supply voltage, frequency, as well as threshold voltage (Vth) of a high-performance portable processor as a case study. Simulation results are used to validate theoretical basics and PF-PSO optimization of threshold-supply voltage scaling (Vth-VDD) approach that shows satisfactory results. Nevertheless, the study could be applied for system level power estimation for various types of high-performance portable systems.

DYNAMIC VOLTAGE SCALING (DVS) AND BODY BIAS VOLTAGE SCALING (BBVS)
High-performance portable processor's growth coming up against minimization of power consumption challenges as key obstacles in the design in spite of heat dissipation challenges that becoming a constraint in terms of total energy consumption of processors, this is a major issue in digital and portable applications and should be considered at each circuit level design. The electrical power consumed in CMOS circuits is mainly divided into two components: a dynamic power due to the switching activity of the transistors, and a static power due to the leakage currents. The modern CMOS technology scaling causes an exponential growth of both static and dynamic power dissipations. Thus, these two components must be considered when optimizing power dissipation. The supply and threshold voltage extent to decrease for maintaining highest performance and lowest power requirements. Total power (Ptot), dynamic power (Pdynamic), and static power (Pstatic) dissipation in a CMOS circuit is given by [13,14], where, α is the node transition activity factor, CL is the total load capacitance, VDD is the supply voltage, and fclk is the clock frequency. µ is the carrier mobility, W is the transistor channel width, Cox is oxide capacitance per unit area, VT is the thermal, Vgs is the gate-to-source voltage, Vth is threshold voltage, m is subthreshold swing coefficient, and Vds is the drain-to-source voltage [15]. The threshold voltage of a CMOS transistor can be given by, where, Vtho, K1, and K are constants. It is obvious that, the threshold voltage (Vth) has a linear relation with Vbs and VDD. This change in threshold voltage from its nominal value Vtho due to Vbs is called body bias voltage (VBB), VBBN is a negative voltage that typically used for NMOS transistors, and VBBP is appositive voltage that used for PMOS transistors. Principally, PMOS and NMOS are designed to implement their balance characteristics. Therefore [17], where, VBBN and VBBP are NMOS and PMOS body bias voltages respectively. When high speed and low latency are desired, Vth is reduced using forward body biasing (FBB). For lower workload, this scheme slows down the circuit by increasing Vth through reverse body biasing (RBB) as shown in Figure 1 [18]. It is important to point out that, the only way to change Vth in circuit level is through changing the VBB which is known by body bias voltage scaling (BBVS) technique to minimize the static power dissipation in microprocessors and CMOS devices. However, the DVS is another efficient technique that minimize dynamic power consumption of processors by scaling down supply voltage and frequency as well when peak performance is unneeded which is known by dynamic voltage scaling technique (DVS). Lowering the supply voltage and frequency accordingly can reduce significant amount of energy. Figure 2 shows power saving achievements using DVS. Figure 3 show the dynamic and static power as functions of VDD for 16-bit adder in 32nm CMOS technology [19]. In spite of this, as the supply voltage of microprocessors goes low, the static power is rapidly increasing, thus joint DVS and BBVS technique is a crucial approach for productive power reduction and temperatures for high performance processors. Hence, this technique is desired to allow a microprocessor core to deliver optimal performance, lower power dissipation and optimal clock frequency as it depends on supply and threshold voltages [20], where  is the velocity saturation coefficient 12, and K2 is a technology specific constant. Therefore, reduction of dynamic power consumption involves by scaling down of supply voltage which is known by DVS, and reduction of static power consumption involves by increasing of threshold voltage through scaling of body bias voltage which is known by BBVS.

PARETO FRONT (PF) AND PARTICLE SWARM OPTIMIZATION (PSO) ALGORITHMS
Multi-objective optimization algorithms are dealing with problems of multiple objectives or often contradictory criteria to be optimized simultaneously. Whereas, for problems including only one objective, the optimum set in demand will clearly be identify, it is also need to formalize for multi-objective optimization problems. In fact, for a problem with two or more contradictory objectives, the optimal solutions will be a set of points corresponding to the best possible compromises that solve the problem. Multi objective optimization problem mathematically can be given by [21].
where f(x) is the related objective function. x is the related decision variable, n is number of objective functions, and m number of decision variables. Typically, it is unworkable to find all decision variables (x) which minimizing all objective functions f1(x), f2(x)..., fn(x). The PF algorithm was initiated by Goldberg in 1989 [22], it uses the concept of dominance to select optimal solutions that bring the population towards a set of solutions which is called Pareto optimal set (POS). The method has proven to be the most effective. Nowadays, the majority of algorithms use a Pareto approach to deal with multi-objective problems to determine a brief compromise set of solution results that involves a tradeoff between the objectives. For each element in POS, none of the objective functions can be further increased without a decrease of some of the remaining objective.
Mainly, the Pareto optimal solutions are known as the Pareto front (PF) in the objective space and the Pareto optimal set (POS) in the decision space. Therefore, any solution in this set carries same significance and is a great compromise among the tradeoff objectives. Figure 4 shows decision variable space with two objective functions and Pareto front (PF). Optimal power dissipation is a multi-criteria optimization problem; commonly the problem is converted into a parametric auxiliary single objective problem, its conceptual solution provides a Pareto-optimal point by determining optimal body bias, threshold, and supply voltages (VBB/Vth-VDD) that ensures optimal power dissipation reduction. Reducing computational time delay as well as the searching zone, the PF dominance fundamentals are utilized along with PSO approach principles. POS is applied for identifying all feasible sets of joint Vth/VBB-VDD which bring multitude of possible right choices. For this joint process of decision variables and dominance sets; none of the Vth/VBB and VDD can improve unless lowering some other objective variable values. The PF of non-dominated set of solutions is estimated based on a free running process in the design zone that containing all joint minimization sets, and then minimization to a restrictive set is applied to restrict the search space with new restricted constraints and limitations. Obtaining weighting factors are basically depending on the multiobjective PF in order to normalize power dissipation and temperatures as objective functions with a very good influence to convergence [23].
PSO algorithm was proposed by Russel Eberhart (electrical engineer) and James Kennedy (socio-psychologist) in 1995 [24]. PSO is a stochastic algorithm in optimization methods that modeled by a mathematical equation to guide the "particles" during the displacement process. The motion of a particle is affected by three components: social component, inertia component, and cognitive component. PSO has accurate advantage of power dissipation problems in microprocessors and VLSI designs, without user-defined modification of the structure of the algorithm. A swarm of particles are feasible solutions to optimization problem over the research space to determine global optimum [25,26]. The particle displacement strategy is illustrated in Figure 5. In the research space of dimension D, the particle in the swarm is modeled by its position vector → = ( 1, 2, … , ) and by its speed vector → = ( 1, 2, … , ). The quality of the position is determined by the value of the objective function at this point. This particle keeps in memory the best position through that has already passed, expressed by → = ( 1, 2, … , ). The best , ω is coefficient of inertia (constant); c1 and c2 are constants, known by acceleration coefficients; r1 and r2 are random numbers drawn uniformly in [0, 1], at each iteration t and for each dimension j. Once the particles moved, the new positions are evaluated and the two vectors of Pbest and gbest will update accordingly, n is the number of particles in the swarm [27,28].
There are many problems in optimal combination sets that provide complexities and difficulties in the evaluation of optimal value points. Objective functions are conflicting to each other, not just a single optimum solution but also sets of optimal solutions are available. The objective function of minimum power dissipation (Popt) constitute a multidimensional space of body and supply voltages (VBB/Vth, VDD), and other state and technology parameters to be take into consideration as fitting parameters. For each solution, temperature, VBB/Vth, Vdd, load capacitance (C), and fitting parameters are the decision variables, rather than constant that takes into consideration. We will try to determine the feasible region of Popt as a Pareto optimal set or Pareto optimal solutions. For all body bias and supply voltages, the POS algorithm can retrieve thousands of possibilities to determine all Vth-VDD joint sets, none of the VBB/Vth and VDD values can be improve without modifying some values of other objective variables. For the evaluation of POS of non-dominated solutions, a complex code arrangement was used. The POS of non-dominated solutions is estimates through a stochastic evaluation method of the design for different temperature levels and different body bias and supply voltages.
It is obvious that, the procedure is the optimization of conflicting variables and objectives. We were able to use one objective function at a time or linking all objectives with a common feature weight. So, none of sets is definitely preferable as compared to other sets. We call this the "non-dominated" set of solutions means that, none of the solutions sets are dominated. All Pareto-optimal solutions are non-dominated. Thus, it is crucial to attribute the solutions as close as possible to the POS as far as possible using a combination of numerical weights for the objectives [29]. There are different methods used in practice, but one is to use a PSO algorithm to specify points along the Pareto optimal solution over different several iterations, then rank and evaluate the quality of the trade-offs based on the particular application being modeled. Therefore, PSO is called for to determine the best variables value in POS that meet the optimal power dissipation.
The presented PF and PSO algorithms are crucial for determining optimal amount of power dissipation minimization; it means that the problem is minimization. Minimize the objective function of multi variables with bound constraints requires; defining the objective functions, setting bounds on the variables, calling particle swarm to minimize the function, or use the private code, Finally, optimization ended: relative change in the objective value. However, and instead of all, Pareto XLS files are considered as the new search space bound. All individuals search space of PSO flies has a velocity that dynamically adjusted. For all steps, the PSO algorithm changes velocity of each particle toward its position-best (pbest) and global-best (gbest) locations. The acceleration is weighted randomly with multiple generated random numbers toward pbest and gbest locations. The modified position and velocity of each individual particle can be calculate using the current velocity and the distance from pbest (current position) to gbest (best position) [30].
The power dissipation function takes an array of inputs and produces a single output. The objective is in finding what input results in the lowest possible output for the power dissipation function. Since, the function wasn't differentiable and the range of inputs were quite small, PSO could just search the entire input space to find the best output.
The results were achieved in a CPU environment, i.e. just one solution within the search space was evaluated at particular moment. The PSO optimization results confirm the microprocessor's performance gain. It gives combinations of parameter to enhance the whole optimization performance. Its conceptual simplicity produces applying this method a straightforward study plan.

RESULTS AND DISCUSSION
In order to inquire for checking operation, functionality, and integrity of the PF-PSO approach; the device under test is being used for correct operation. The electrical responses are studied to determine the design productivity in power dissipation reduction of the chip. Using 22nm lithography meet specification of the new processor families and compatible with the last SPICE simulation software. Spice-XVII and ORCAD 17.2 are used for the specification of Intel® Core™ i7 Processors. The temperatures ranges (30-70) o C of these processors are classified into: idle temperature, normal temperature and maximum temperatures. While the voltage identification range is 0.800V-1.375V and the core voltage (Vcore) range is 0.930-1.205. The model is calculated and results are recorded, for each degree of temperatures and typical workloads for different benchmark programs like TZ00, TZ01, TPIN5, TPIN6, and TPIN3. Based on these evaluations a typical 1 o C degree is selected as the temperature width for all processor's operation modes.
In order to confirm the results finding optimal Vth-VDD sets; Pareto optimal solution (POS) and Particle Swarm Optimization (PSO) algorithm are used for optimal level of power dissipation minimization for multi-level of temperature and workload conditions. First, The PF of non-dominated solutions is measured for different body and supply voltages, Then PSO determined the optimal variable values. Table 1 shows optimal simulation of PF-PSO algorithm results. Figure 6 shows the power minimization percentages.  It is clear that, the power consumption is a convex function of the body and supply voltage. A direct consideration to minimize power consumption of a microprocessor device subject to a given body and supply voltage constraint as presented. An implementation and optimization results are presented to establish the impact and usefulness of the approach.
These validation results indicate that, our approach in joint threshold and supply voltage was successful by dynamically tuning threshold and supply voltages through reverse body-bias and voltage scaling and is indicated its effectiveness for reducing high temperatures and power dissipation while keeping fixed speed in the active and idle operational modes of microprocessors, and experiencing significant power and temperature variations.
These results verifying that, PF-PSO results for dynamic operational workload or temperature time limits, the processor have a minimum average power consumption when applying adaptive Vth-VDD and directed to high performance estimation. It is verifying adaptive joint Vth-VDD technique for power as well as temperature aware to the processor designers in their verification efforts due to the workload's parallel nature and idle/active states of portable microprocessors. Significant power savings can be obtained by optimal selection of Vth and VDD for many circuit architectures, operating environment and adaptively tuning these values built on workload variations in a runtime manner.
Simulation results in using PF-PSO for power dissipation reduction in microprocessor under test that accepting joint Vth-VDD confirmed satisfied achievements that efficiently tolerate workload variations on the voltage, frequency, and temperature variations with a minimal penalty in performance requirements. Therefore, when the processor enables operation at lower supply voltage under the same environmental variation, it helps reduce the clock frequency margin for lower power dissipation requirements. The technique allows the overall system to operate in a wide voltage and frequency domain while maintaining the chip power dissipation.

CONCLUSION
The results showed that, the microprocessor device dissipates an optimal amount of power dissipation when the joint threshold-supply scaling (DVS-BBVS) is applied on a dynamic computation workload at different execution times to attend a high performance estimation. Consequently, it verified the use of PF-PSO approach for joint Vth-Vdd scaling for power and temperature aware design which is every efficient to the processor architect and designers in formalizing their verifications because of the workload parallel nature and active-idle modes of processors. Subsequently, the PF-PSO approach estimated optimal energy savings when applying a correct body and supply voltages for different operating domain and different circuit designs on the nature of the workload dynamic variations during real operations. Finally, PF-PSO approach results showed that, the joint Vth-VDD can perform a large aspect in the optimal power achievements, thus its effect increases as technology is scaled down to the deep nanometer process. It is confirmed, a potential reduction of the dissipated power was in the range of (2.954%) up to (15.478%), and reduction of the temperatures was in the range of (2 o C) for each step in the body and supply voltage variation that led to a powerful improvement in power dissipation and as compared to the previous studies.