Dynamic Frequency Scaling Regarding Memory for Energy Efficiency of Embedded Systems

ABSTRACT


INTRODUCTION
Dynamic Voltage/Frequency Scaling (DVFS) has been used to reduce power consumption of computing systems. DVFS is a technique that increases or decreases the supply voltage by adjusting the operating frequency of CMOS circuits. CMOS circuits have static and dynamic power dissipation, and dynamic power dissipation is the dominant component in CMOS [1]. Most research on DVFS technique has focused on CPU DVFS [2], [3] because the CPU is the most power-consuming device when a computer system is actively running. Many contemporary OSs support DVFS of CPU. Linux's cpufreq [4] subsystem is an example.
The frequency scaling technology is supported in hardware devices other than CPU such as the GPU or the memory bus, in such cases the operating frequency of the device can be managed by the user. For example, Linux system has a subsystem called devfreq to support frequency scaling of devices other than CPU [5]. Nexus 6 smartphone is a commercial mobile device supporting device frequency scaling, which allows us to adjust the clock speed of the memory bus that affects the memory bandwidth. Changing the frequency to access memory gives us another option to manage the power consumption of embedded systems.
Attempts to manage the power consumption of memory access have been recently made. In [6], they proposed a DVFS method for DRAM based on memory bandwidth utilization. They devised a bandwidthbased frequency selection policy using their finding in experiments that memory latency is not significantly affected by the memory frequency at low bandwidth. But because memory hardware with DVFS support was not available, they emulated frequency scaling using timing delays. No DVFS is supported by DRAM so far because scaling of IO voltage on DRAM affects the stability and requires significant hardware change, but 1799 DFS (Dynamic Frequency Scaling) is possible. In [6], low-power mode of DRAM by DFS was introduced to achieve energy consumption reduction with limited hardware change. The results of [7] were further extended to consider both CPU and DRAM power consumption in a server [8]. In [9], power management of DRAM using both DFS and low-power states is modeled and studied using simulation. The joint scaling of CPU and DRAM frequencies was also studied in [10] for server systems. Low power design of SRAM can be also considered as in [11]. But in this paper, we focus on DRAM power management. Previous works require hardware changes for memory frequency scaling to manage DRAM power consumption. In this letter, we propose a power management method by combining DVFS of CPU and DFS of memory bus. We show that CPU and memory are closely related in the view of energy efficiency, which depends on the number of memory access per instruction. From the relationship, we find an optimal frequency ratio between the CPU frequency and the memory frequency.
The study was performed using a real device. The target device used in this study is a commercial smartphone, Nexus 6, which has Snapdragon 805 CPU with 3GB lpDDR3 SDRAM. The CPU frequency can be set to one of 18 levels (300, 422. 4 This paper is organized as follows. Section 2 explains the power model of the CPU and the memory that is used for the analysis on the relationship between the CPU and the memory frequencies in Section 3. The analysis in Section 3 shows the CPU frequency and the memory frequency are closely related in terms of energy efficiency. Based on the analysis, Section 3 presents a method for frequency selection of both the CPU and the memory. In Section 4, experimental results with a commercial smartphone on which our frequency selection method implemented are presented. Finally, Section 5 concludes our work.

POWER MODEL OF CPU AND MEMORY
The power consumption of the CPU in embedded systems is usually divided into dynamic and static power [12]. The power consumption of the CPU can be modeled as: (1) where is a coefficient of the switching activity and the effective capacitance, is the operating voltage, is the CPU frequency, and is the leakage current. Reduction in operating voltage decreases the dynamic power consumption, but increases the circuit delay. The relation between the operating voltage and the CPU frequency is given by: where is a threshold voltage which is much smaller than the operating voltage [13], [14]. Equation (2) can be rewritten as , from which we can reformulate Equation (1) as: where is a variable depending on the switching activity and is a hardware-dependent constant. For multicore CPUs, the power consumption will be given by summing of each core power, that is, ∑( ) where represents the core number. Switching activities may differ from each other. On the other hand, the power consumption of the DRAM system can be divided into operation power and background power [6], [9]. The operation power is the power required to execute memory reads and writes. The background power accounts for all power consumption when there is no memory access. Lowering the frequency to access memory affects the power consumption; it lowers background power linearly [7]. The operation power is not affected by memory frequency, but the energy required for memory access increases because the access time becomes longer. For the DDR-series DRAMs, the background power is a major component in the total DRAM power consumption [6]. So we assume that the operation power can be ignored in our model, and the power consumption of the main memory is modeled as: is the memory device frequency, and is a hardware-dependent constant. Combining Equations (3) and (4), we have power consumption estimation: (5) assuming the CPU and the memory are the dominating power consumption devices. We have measured power consumption of the target device, varying the bus frequency when the system is in idle state ( ). Results are shown in Figure 1. Because the operating frequency of the SDRAM ranges from 166MHz to 800MHz, the power consumption is almost unchanged below 200MHz and above 796MHz. Changing the CPU frequency from the lowest level to the highest level in idle state does not affect the power consumption. The power consumption at the lowest bus frequency is about 0.305W and at the highest it is about 0.621W. The difference between the maximum and the minimum is about 0.316W and when is represented in GHz. To obtain the values and we used cpubomb included in Isolation Benchmark Suite [15] that fully utilizes the CPU and does not access memory. Figure 2 shows the power consumption of cpubomb for different CPU frequencies when only one core is used for the benchmark. Memory bus frequency was fixed at the lowest level, so we assume 0.305W is consumed by memory device. Regression analysis gives us and , which provide estimation very close to the measured ones. The hardware dependent parameters are used to estimate for a multicore application using the relationship ∑( ) (∑ ) where is the number of cores used. The comparison between the estimated values and the measured values is shown in Figure 3. The estimation error is accumulated as the

CPU AND MEMORY FREQUENCY SELECTION FOR ENERGY EFFICIENCY
We use EDP (Energy Delay Product) [16] as the measure of energy efficiency to consider both turnaround time and energy consumption. The energy-delay product has been widely used as a metric to measure the energy efficiency coupling both the energy consumption and performance. It is the multiplication of the delay time (execution time) until the end of the program and the energy consumption during the execution of the program. Because the energy consumption of executing an instruction is the multiplication of the power consumption and the execution time of the instruction, the EDP of executing an instruction is modeled as where is the Cycles Per Instruction and is the number cores used. is affected by the memory frequency if there is a memory access. For the RISC CPU such as ARM processor, if we let be the CPI when there is no memory access, can be estimated as where is 1 if there is a memory access, otherwise 0. Assuming is a constant value, minimizing the EDP is equivalent to minimizing the following: If , we have ( ) (9) and the optimal value of can be obtained when is at its minimum. When , because harmonic mean is not larger than geometric mean, we have where is given in Equation (5). So the optimal value of can be found when is at its maximum. Thus, when is the rate of memory access per instruction, the expected ( ) can be found by minimizing the following: By letting in (11), we have With CPU frequency given, we can calculate the frequency ratio minimizing Equation (12) as If , so will be set to the minimum value. If , the value of can be calculated with a given , then we get the corresponding value of . With limited number of CPU frequency levels, we can calculate the value of for each with given utilization and the average memory access rate per instruction. Then we compare the corresponding energy consumption using Equation (11) to determine the pair of and that give the minimum value. As an example, the values of and obtained for a single core application are shown in Figure 4 ( is in 0.1%-99.9%, increased by 0.1%). The highest memory bus frequency is used as a bound to indicate that the optimal is higher than 800MHz. The results show that if the memory access rate is less than 0.3% we do not need to raise the memory access frequency from its lowest level. With less than 3.5% of the memory access rate, the frequency should be maintained below its highest level.

APPLICATION TO A REAL TARGET
We measured energy consumption and performance of applications on a real target (Nexus 6) to validate our analysis. To measure the power and energy consumption, we disassembled the battery parts of Nexus 6 and connected the charging port to a digital power meter (ODROID Smart Power was used) which supports 10Hz sampling rate. We tested three benchmarks: cpubomb, ramsmp [17], and STREAM [18]. ramsmp and STREAM have 4 kinds of operations: copy, scale, add, and triad. Copy moves data in an array to another. Scale multiplies a value to data from an array and stores it to another. Add adds data from two arrays then stores the sum to the other array. Triad combines scale and add. Operations of ramsmp were tested separately, but those of STREAM were tested all together for comparison. The ranges of memory access rate We compared the energy efficiency of different governors of Linux with the presented method. Our method was implemented using the governor interface of Linux and the sampling rate of our policy is the same as other governors. Linux supports 3 dynamic policies for CPU DVFS: conservative, ondemand, and interactive. The default CPU DVFS policy for Nexus 6 device is the interactive governor, which is typical for Android devices. The governor for the memory bus is cpubw_hwmon; it monitors the memory reads and writes and adjust the bus frequency according to the memory bandwidth. Note that our method has an integrated governor that performs both CPU DVFS and control of memory bus frequency simultaneously. Figure 5 compares the EDP of benchmarks.

Figure 5. EDP of benchmarks
With ramsmp, frequency scaling of CPU and memory based on our analysis shows lowest EDP value in this experiment. EDP value is enhanced about 8.6% for copy operation and about 3.3 % for triad operation over the default governor. The energy efficiency was enhanced about 3.4% over interactive governor and 9.6% over conservative governor in total operations of ramsmp. Test with STREAM benchmark shows similar result: enhanced 7.6% over interactive and 11.7% over conservative governor. If memory is barely accessed, the proposed method does not degrade performance as in the results with cpubomb.

CONCLUSION
Although the CPU is the most power-consuming device in a computer system, memory also has the significant effects on power consumption as well as performance. Because of its impact on the performance, the memory is important especially in terms of energy efficiency. Thus frequency selection of CPU without considering the memory access could fail in optimizing the energy efficiency of the system. In this paper, we have analyzed the relationship between CPU and memory frequency in the view of energy efficiency. For CPU-intensive applications, lowering memory access frequency can reduce the power consumption of the system. For applications with considerable memory access, proper selection of CPU and memory frequency is needed. We presented a model for selection, and it was tested on a real target (Nexus 6 smartphone). The results show frequency assignments based on our analysis enhances energy efficiency.