Hardware simulation for exponential blind equal throughput algorithm using system generator

ABSTRACT


INTRODUCTION
Scheduling algorithm is the method to allocate radio resources to user equipment (UE) [1]. The UE, for example mobile phone that transmit different flows such as web browsing or video streaming at the same time. The process of scheduling mechanism is based on scheduling algorithms implemented at the Long Term Evolution Standard (LTE) base station, Evolved Node B. The scheduling process is performed in the Medium Access Control (MAC) layer. Since the implementation of scheduling algorithm is an open issue in LTE, many scheduling algorithms have been proposed by the researchers [2], [3]. Previously, various scheduling algorithm which offered several techniques in handling resources to the users have been developed such as Frame Level Scheduler (FLS) [4], Modified Largest Weighted Delay First (MLWDF) [5], Proportional Fairness (PF) [6]. In general, many researchers have also suggested packet schedulers that allocate the resources to UEs by considering the channel quality conditions such as Best Channel Quality Indicator (BCQI) [7] and Maximum Rate [8]. In LTE, one of the important features is the scheduling algorithm. The algorithm itself will determine which packet bring the first priority to be scheduled.
However, none of them have proposed the scheduling algorithm that consider both real-time flow such as video streaming, online gaming and non real-time flow such as web browser, email. The study in [9] Int J Elec & Comp Eng ISSN: 2088-8708  Hardware simulation for exponential blind equal throughput algorithm using system…(Yusmardiah Yusuf) 171 has proposed the EXP-BET algorithms. These algorithm consider both real time and non real time flows simultaneously. Based on the simulation results, the EXP-BET algorithm performance was better than the FLS and EXP-PF algorithms for the real-time services. For the non-real-time services, EXP-BET has shown a 17.72% improvement as compared to FLS and 7.52% for EXP-PF in fairness index. The authors conclude that, scheduling could be recommended as one of the methods to solve the problem of the cell edge users since EXP-BET algorithm gave a fair share of the system resources to users considering multiple services.. Field Programmable Gate Array (FPGA) was established by Xilinx Company It is developed based on the programmable logic devices (PLDs) and the logic cell array (LCA) concept. By providing a twodimensional array of configurable logic blocks (CLBs) and programming the interconnection that connects the configurable resources, FPGA can implement a wide range of arithmetic and logic functions [10], [11]. The architecture is a reconfigurable logical device made up of an array of small logic blocks and allocated interconnection resources. FPGA has the advantages in terms of performance, cost, reliability, flexibility and time-to-market [12] as compared to other popular IC technologies such as application specific integrated circuits (ASICs) and digital signal processors (DSPs).
In terms of FPGA implementation, none of the researchers have implemented the EXP-BET scheduling algorithm using the hardware platform. In 2015, the authors of paper [13] have focused on the implementation of various algorithms for an arbiter with low port density (8-bit) using FPGA platform. Round robin arbiter which led to strong fairness is selected and it works on the principle that a request that was just served should have the lowest priority on the next round of arbitration.
Over the past few years, new software tools have been established by Xilinx Company for the development of the FPGA. Using Simulink as add on tool, they presented the System Generator that concedes the design of the hardware circuits configured with the Simulink environment. Furthermore, the combination of Xilinx System Generator and Simulink environment provides simple technique of the hardware design through the use of existing System Generator blocks and subsystems. This will save both the required design time and hardware implementation resources. Hence, the proposed algorithm is ready for commercialization as FPGA is faster to market. In FPGA, no layout, masks or other fabricating steps are needed and it is simpler to design as compared to ASIC [14]. The hardware implementation is important for designers of high-performance (Digital Signal Processing) DSP systems such as wireless networks. Hence, verification on a hardware is needed to validate the theoretical and simulation work.
Therefore, this study aims to implement and verify the hardware simulation of EXP-BET algorithm using Xilinx System Generator (XSG). The algorithm is modelled using MATLAB Simulink which is configured with XSG. The paper is organized as follow: in Section 2, we describe the research method. Section 3 presents the results and discussion. Finally section 6 draws the conclusion.

RESEARCH METHOD
The proposed packet scheduling algorithm for the downlink transmission of LTE is the Exponential Rule and Blind Equal Throughput (EXP-BET) algorithms. The flowchart for the design of the EXP-BET algorithm is presented in Figure 1. The EXP Rule algorithm schedules the real-time services while the BET algorithm take cares of the non-real-time services and served the users based on the metrics equation (1-2).

Exponential (EXP) rule
The main idea behind the EXP Rule algorithm is to have fair treatment between throughput, fairness, and delay requirements for a scheduling algorithm. The EXP Rule gives higher priority to the user with the highest transmission delay or user that has more packets in its buffer. It is a channel-aware scheduling algorithm which considers the CQI metric in the scheduling decision [15] and has been proved to be the most promising approach for delay sensitive real-time applications such as video and VoIP. This is described by the metric of (1): Where αi is the tuneable parameter which is equal to 5/0.99τi, τi is the the tolerable time interval within which the packet must receive, DHOL is the delay of the first packet to be transmitted by the i th user, AverageDHOL is equal to

Blind equal throughput
Fairness can be achieved with Blind Equal Throughput (BET) which stores the past average throughput achieved by each user. The metric (for the i th user) is calculated as: Where Ri(t) is equal to βRi(t-1) + (1-β) ri(t), β is the weigh factor for moving average β(1≤β≤0), Ri(t-1) is the past average throughput of the user at time t-1, ri(t) is the achievable data rate for user i th at time t th .

Figure 1. EXP-BET design flow
The EXP-BET algorithm is modelled using the Xilinx Blockset. The Xilinx Blockset library contains all the basic blocks such as adders, multipliers, registers and memories for the specific design. The algorithms are developed and models are created for all the mathematical operation for the EXP-BET metric's computation using library provided by Xilinx Block set.
To implement the EXP-BET algorithm into FPGA, MATLAB Simulink [16], and Xilinx system generator tools need to be configured. In the Simulink environment, the FPGA boundary is defined in the Gateway In and Gateway Out blocks where the input and output for the FPGA is fed into the Gateway In and the output is produced from the Gateway Out port. These ports interface the Simulink double data type and the FPGA fixed point environments. In the Gateway In block, the Simulink floating point input is converted to a fixed point format, saturation and rounding modes. These parameters are defined by the designer. The system output which is generated by the Gateway Out port converts the FPGA fixed point format to Simulink double numerical precision floating point format. Hence, the system is simulated, tested and verified by examining the results which is generated on the display port from the Simulink source library. To validate the designed model in Simulink, timing Hardware simulation for exponential blind equal throughput algorithm using system…(Yusmardiah Yusuf) 173 analysis is used. Timing analysis is represented with delay parameter and it is used for verification of Simulink environment design. This verifies the functionality of the system model generated using the XSG and Simulink. The next step is to set up the system generator for the hardware Co-simulation. In fact, the hardware Co-simulation is one of the techniques provided by the system generator to transform the model built in Simulink environment into hardware. The XSG can be used with different types of FPGA boards and provide few other options for clock speed, compilation type and analysis. FPGA board used for the implementation of EXP-BET algorithm is Virtex-6 xc6vlx240t-1ff1156. Lastly, the FPGA is compiled using bitstream programming file (BIT) that is automatically generated by the System Generator during Hardware Co-Simulation. After the generated bit file is downloaded onto the FPGA, the input to the device is fed from Simulink's source block and the device output is received back in Simulink's sinks block. This enables wide-ranging testing as the data from the FPGA can be directly transferred to the MATLAB environment. After the hardware Co-simulation is completed, the results can be seen on a display sink blocks from the Simulink library. If the output is similar to the Simulink environment's output, then the algorithm is confimed to be successfully prototyped. The Xilinx blockset used in the design is presented in Figure 2.

RESULTS AND ANALYSIS
This section discusses on the results of simulating the EXP-BET metric equation in the System Generator. The results obtained are then verified using hardware co-simulation.

Simulating the EXP-BET algorithm using system generator
Firstly, the design of EXP-BET is verified through rate and type propagation using the System Generator block. If a signal carrying floating-point data is connected to the port of a System Generator block that does not support the floating-point data type, error will be detected. The rate and propogation type for EXP and BET algorithms are illustrated in Figure 3 and Figure 4.

Timing analysis
Timing is very important when the designer is working with hardware description language. Hardware language involves simultaneous execution of process which means it runs in parallel manner. The System Generator provides a timing analysis tool named the timing analyzer to assist the timing analysis of the hardware design. Timing analyzer provides a report on slow paths and clearly displays the paths that failed on hardware. The System Generator block gives three options of clock frequency which are 100 MHz, 50 MHz or 33.3MHz [15] for the Xilinx ML605 board. To start off, 50 MHz of clock frequency is selected which means that the system should operate within 20 ns of FPGA clock period. The formula for the calculation of clock period is: where f is the frequency.
It is observed that the EXP system is failed to generate the hardware co-simulation and the total path delay is 112.64 ns which is obviously higher than 20 ns of clock period as shown in Figure 5. The timing analyzer in Figure 6 is detailing on the failed path of the EXP system and will automatically highlighting the blockset of the EXP system as shown in Figure 7 when the cursor is pointed on to one of the listing as in Figure 6. The failing path shows that timing violations have occurred and the input from one synchronous output stage does not reach the input of the next stage within the required time by the system design. As observed in Figure 7, the timing failed for the paths of divide, square root and CORDIC 4.0. Henceforth, the failing paths need to be optimized. The slow path for each block is optimized using pipelining method since the hardware operation is working in parallel manner. Thus, the calculation is split up into multiple cycles. For example, the addition operation needs to wait for the division operation that takes much iteration to produce output. Thus, the latency is added to the addition operation as to wait for the division operation. One of the ways that can be used to address the problem as aforementioned is by implementing the pipelining method. This can be done by adding register or delay stages requirements during synthesis and tries to generate hardware co-simulation as to meet the requirement.
In this research, the latency of the individual block is added throughout the design as tabulated in Table 1. Latency or clock period is the number of cycles required for the system to accept the next input. For example, if the design needs to accept new input and requires 10 cycles to propagate from input to output, thus, it means that the latency is 10. Thus, to address the problems as in Figure 6 to Figure 8, the clock frequency should be set to the minimum which is 33.333 MHz. If the clock frequency is at a slower rate, then the timing constraint will be much easier to accomplish. Table 2 shows the frequency and FPGA clock period for the EXP-BET system before and after optimization process. The optimized EXP-BET system is simulated once again and achieves all the timing constraints. The EXP-BET system is successfully verified in the hardware co-simulation when the output of bitstream is successfully generated after the compilation stage. The hardware co-simulation is considered fail when the timing constraint is violated. Figure 8 and Figure 9 illustrate the histogram for EXP-BET path delay after the system is being optimized. The Histogram Charts of 150 paths delay distribution are behaviourally generated via the Xilinx timing analyzer targeting the Virtex-6 FPGA board. Each histogram chart is a useful metric to analyze the FPGA implementation of EXP-BET and grouping 150 paths into regions of roughly formed normal distribution cluster due to different portions of the system generator architectures, or from different timing clock region constraints. The numbers at the top of the bins indicate the number of slow paths. The improved parameterized FPGA implementation can be adjusted so that all signals are completely routed, and all timing constraints are met. The histogram charts of Figure 8 and Figure 9 shows the BET and EXP Rule path delay operate within 30 ns of clock period (33.333MHz) and meet the timing constraints. As illustrated in Figure 8 and Figure 9, majority of the slow paths for BET occurred at 25.06 ns whereas for EXP, the slowest path is observed at 29.65 ns respectively. Therefore, it can be concluded that the EXP-BET system is able to run on the FPGA board within 30 ns of clock period.

Power analysis
Xilinx constantly innovates to make sure the power challenges associated with shrinking technologies can be overcome. Xilinx understands that FPGA power consumption is one of the biggest concerns of FPGA users. Xilinx Power Tools help to perform power estimation and analysis for a given design. Power estimation and analysis become even more important as FPGAs increase in logic capacity and performance by migrating to smaller process geometries [18]. The Xilinx Power Analyser (XPA) is used to analyze the power consumption of the design which depends on the family of the device used, clock, logic, signal, I/Os and leakage power. Table 3 shows the estimated power consumption for EXP-BET system. The designed architecture uses a total power of 3.472 Watt and 3.437 Watt for EXP-BET respectively. As a conclusion, this power shows minimum consumption of Virtex-6 FPGA. It is being proved that, current FPGA technology such as Virtex-6 gives low power consumption and operates at maximum performance [19].

Design summary for device utilization
The EXP-BET was implemented in an XC6VLX240 FPGA. The flexibility of the Virtex6 FPGA is realized in the slice resources. Each slice is composed of two 6 input look-up tables (LUTs) and associated logic. The slices are laid out in an array-like structure and each can be reconfigured to form larger complex systems. FPGA logic design is controlled at the bit level, giving the user the power to decide what resources to use, placement of the design in hardware and the maximum sustainable clock frequency. Table 4 shows the device utilization summary for EXP-BET system. The maximum operating frequency and power utilization along with the resource utilization before and after the optimization stage in the critical path are included. The FPGA framework is the fundamental structure of the logic device, which consist of Flip-flops (FFs), Look Up Tables (LUTs) and Slices. The IPs hard cores are DSP48E1 [20]. Each Virtex-6 FPGA slice contains four LUTs and eight FFs. Only some slices can use their LUTs as distributed RAM. Each slice has one set of clock, clock enable, and set/reset signals that are common to both logic cells. According to the simulation reports (refer Appendix), the BET system requires just 3% of the logic resources in FPGA; LUTs (1%), FFs (1%) and Slices (1%). Whereas, for EXP Rule system require 10% of the logic resources in the FPGA. It is composed of LUTs (4%), FFs (1%) and Slices (5%). A LUT Flip Flop pair for this architecture represents one LUT paired with one Flip Flop within a slice. The clock rate of FPGA Virtex-6 family is 600 MHz which is large enough to drive the whole system.
According to the simulation results, the BET system took 0.209 ns to finalize the generation of the output. The EXP system took 0.246 ns to completely calculate the output. Since the latency is small, the EXP-BET system can generate output continuously because of the pipelined design of the system. Moreover, the pipelining design makes the delay of the clock net very small which is about 0.2 ns and improved the system performance. Using Xilinx Power Analyzer as a power estimation tool, the total power is estimated depending on the device utilization, clock rate and device data model.

Hardware co-simulation
The final verification was completed by implementing the hardware co-simulation of the system which allows a system simulation to be run completely on FPGA, while showing the results in Simulink. By selecting the point-to-point Ethernet interface, a new hardware co-simulation block is automatically generated. This is the process of generation of the equivalent hardware, for the EXP-BET. The Virtex-6 (xc6vlx240t-1ff1156) is used and with the help of XSG and Xilinx XFLOW, the equivalent hardware generated the programmable bit file as shown in Figure 10 and Figure 11. Table 5 shows the metric value of the EXP-BET algorithm generated using the Co-Simulation method using the fixed input values.  Figure 10. BET hardware co-simulation model Figure 11. EXP rule hardware co-simulation model

179
The port names on the hardware co-simulation block which are Gateway In1 until Gateway In5 are matched to the port names on the original subsystem. The port types and rates also matched the original design. When a value is written to one of the block's Gateway input ports, the block sends the corresponding data to the appropriate location in hardware, the controller output (Gateway Out) from the hardware is read back into the Simulink module using the Ethernet interface, the output port converts the fixed data type into the Simulink format and fed into the model.
The EXP-BET system has been simulated for the hardware simulation and has been successfully implemented on the FPGA. The output values for the EXP-BET system are 10.16 and 0.1053 respectively and representing the metric value of the LTE's scheduling algorithm. The EXP-BET system is verified since the calculation of the metric values in Simulink environment produce similar results to the Hardware Cosimulation. The chosen device for prototyping is Virtex-6 FPGA, and the hardware description language is Verilog. A system is then generated for Integrated System Environment (ISE), which includes the files for the structural description of the system.

CONCLUSION
The implementation of EXP-BET scheduling algorithm on FPGA was presented in this paper. The EXP-BET is an algorithm which consists of the Exponential Rule (EXP Rule) and Blind Equal Throughput (BET). The work presented was designed and simulated using the Xilinx System Generator, Xilinx ISE Design Suite and MATLAB Simulink. This resulted in a mathematical modelling of the EXP-BET metric equation using System Generator blocks. The time requirement for path delay is 30 ns which means that the system is expected to run at a clock rate of 30. Otherwise, the system will not meet the constraint and cannot run on FPGA. The final verification of the design is conducted using Hardware Co-simulation approach. The Hardware Co-simulation is a process of generating the equivalent hardware in terms of bitstream for the EXP-BET algorithm. Then, the System Generator generated the bit file which is downloaded to Virtex-6 FPGA.
This study provides the design and implementation process of an FPGA based system using System Generator for a scheduling algorithm namely the EXP-BET algorithm. It can be used as a basis for the future work towards the application in LTE/LTE-A. In addition, a practical system could be established and implemented if the whole system of transmitting and receiving of the physical layer is established. The limitation of this research is that, there is no input signal that can be injected into the EXP-BET system on FPGA since the scheduling algorithm is located at the LTE MAC layer and the input is transmitted from the physical layer. Hence, the implementation must start from the physical layer to generate the input for the scheduler. Further study should therefore concentrate on the hardware implementation for the whole system which starts from the physical layer protocol. Thus, the results of the implemented EXP-BET algorithm can be analysed and validated in terms of QoS requirements such as throughput, delay and packet loss rate.