Q-learning vertical handover scheme in two-tier LTE-A networks

Received Apr 1, 2020 Revised Apr 11, 2020 Accepted May 1, 2020 Global mobile communication necessitates improved capacity and proper quality assurance for services. To achieve these requirements, small cells have been deployed intensively by long term evolution (LTE) networks operators beside conventional base station structure to provide customers with better service and capacity coverage. Accomplishment of seamless handover between Macrocell layer (first tier) and Femtocell layer (second tier) is one of the key challenges to attain the QoS requirements. Handover related information gathering becomes very hard in high dense femtocell networks, effective handover decision techniques are important to minimize unnecessary handovers occurred and avoid Ping-Pong effect. In this work, we proposed and implemented an efficient handover decision procedure based on users’ profiles using Q-learning technique in an LTE-A macrocellfemtocell networks. New multi-criterion handover decision parameters are proposed in typical/dense femtocells in microcells environment to estimate the target cell for handover. The proposed handover algorithms are validated using the LTE-Sim simulator under an urban environment. The simulation results showed noteworthy reduction in the average number of handovers.


Q-LEARNING ALGORITHM
Q-Learning is a type of machine learning technique where an agent attempts to find an ideal strategy from its history of movement inside a dynamic framework [18]. In Q-learning technique, an agent studies ideal activities/actions via experimentation communication with its surrounding. On each progress, the agent picks an activity that modifies the condition of the framework via a progress stage, at that point it gets a reward showing how positive or negative the activity was. The agent objective is to strengthen this reward by calculating the ideal approach and picking the best activity for each condition of the framework. The objective of Q-Learning is to gain proficiency with an approach that advises an agent which activity/actions to make under which conditions.
Definition: In Q-learning procedure, an agent attempts to discover the strategy that maximizes the Q-value function which offers the expected utility of choosing an action a in an existing state s. Formulation: The objective of a Q-learning process is to discover the best strategy Πopt that maximizes the cumulative expected reward (over many trials) in the learning process ( is the number of trials): γ which is (0 ≤ γ ≤ 1) represents a discount factor. At learning trial , with an action taken in state , the received reward is represented as ( − ). For γ = 0 upcoming rewards have no effect on the state value, whereas for γ close to 1, upcoming actions are considered as important as the immediate rewards. A Q-function is defined for a given policy Π as: where: ( , ) is the expected reward of the current pair of state-action, which represents an action a taken in state environment s. , ( ) is the probability of transition from the current state s to the next state υ as an outcome of action a. ( , ) is the new state-action pairs Q-function value.
To ensure that there is at least one optimal strategy Π* in a single agent environment, we apply Bellman's optimality [19]. Q-function maximum value which indicates the optimal action for every possible next pair ( , ) is denoted as * ( , ) . * ( , ) = ( , ) + ∑ , ( ) In an iterative procedure, Q-learning determines the optimal * ( , ). At each stage during the learning procedure, the Q-value function should be updated using the (4): where α represent the learning rate.

RESEARCH METHOD
All parameters related to handover decision phase based on Q-learning technique are defined as follows: a. Environment: involves all components besides the agent In our framework, it contains the macrocell eNB and all femtocells HeNBs in the UEeNB's neighboring cell list (NCL). We consider that the environment is a discrete-time, finite-state and stochastic dynamic system. b. Agent: is the decision maker In our case, the agent involves the macrocell mobile user UEeNB executing a handover process from its serving cell to another neighboring cell that provide better performance. c. State: is the environment's current state In our framework, it involves the current UEeNB serving cell, which is the macrocell eNB. The state set S is defined as = { = 1,2, … , + 1} where is the number of neighboring femtocells. ( = 1) refers to the initial state where the mobile user UEeNB is connected to the macrocell eNB. To select the target cell in a short time we have to short-list the neighboring femtocells, to optimize the candidate neighboring cell list we propose Distance and moving Direction Q-learning based technique (D 2 Q technique). The UE direction assists the handover decision through avoiding signaling measurement controls with neighbor cells that are not ahead of the UE trajectory as well as in selecting the neighbor cell that fits as the target cell. The distance between UE and target cell is important, which should not exceed the cell radius, in order that cells which are far away from the mobile user are not involved in the candidate neighboring list.
Neighbor cells location and each user equipment UE position are determined using GPS [20]. |∓ ℎ°| is the range that all nominee cells should be situated ahead of the user equipment UE direction, and each cell that is located inside this zone will have the priority to be combined into the candidate cell list [20]. Assume that a UE is moving from location P1 to location P2 as shown in Figure 2, P3 is the neighbor cell location. Every neighbor cell of the user equipment is tested via calculating the angle of ∠ 2 , 1 , 3 as following: where 1 , 2 and 3 are 1 ( 1 , 1 ), 2 ( 2 , 2 ) and 3 ( 3 , 3 ) respectively. The distance between the user equipment and the neighbor cell is applied, which should not exceed the neighbor cell radius, in order that cells which are far away from the user equipment are not involved in the candidate cell list [21][22][23]. The distance between the user equipment at position 2 and the cell at location 3 is calculated by (6): For UE moves from position P 2 towards neighbor cell located at P 3 , we consider the neighbour cell to be a candidate cell if(θ ≤ |∓θ th°| ) and(d p 3 ,p 2 ≤ neighbor cell radius d th ). The next stage contains selecting the target cell from the nominee candidate list by utilizing the Weight Adjustment algorithm [20]. In our work, the shortest distance to the user equipment's current position and the narrowest θ from the candidate cell list would be the most appropriate target cell. The Weight Adjustment algorithm is shown in Algorithm 1. is used for choosing the target cell. Furthermore, normalization is also implemented for both distance and angle, in order that both will be according to standard integration. For normalization we use as the angle value.
involves the result of angle normalization as all angles of the candidate cells are less than or equal |∓ ℎ°| , this angle ( ) is used for normalization procedure.
involves the result of distance normalization which is normalized via cell transmission range ( ) to enhance the priority of the angle value, as the distance of all nominee cell list is less or equal to . These methodologies for choosing the candidate cell list and selecting the target cell are illustrated in Algorithm 2. In our framework, it refers to the handover decision results: the UEeNB may keep its connection with the serving macrocell eNB (action1) or select one of the femtocells HeNBs from its NCL (action 2, …, action N NCL + 1). In our proposal algorithm, we use the ϵ-Greedy technique with an adaptive ϵ scheme by presenting RSRQ-dependent exploration instead of a fixed or a hand-tuning ϵ parameter (RSRQ Q-learning based technique (Q2 technique)) [24,25]. Unlike the traditional ϵ-Greedy method, which use a fixed ϵ parameter, the required action of Q2 technique is to make the agent more explorative in circumstances when the information about the environment is unclear. Q2 technique algorithm is shown in Algorithm 3.
-if RSRQt-1 < RSRQt , then ϵ = ϵ -Δϵ -else ϵ = ϵ + Δϵ e. Reward: It indicates the quality or goodness of the action a in the state s, considered as a utility function and denoted by R In our framework, the reward is the earned capacity after connecting to the target cell (eNB or HeNB). Our objective is to maintain and maximize the capacity of UEeNB connecting to a new cell after a handover process (Capacity Q-learning based technique (CQ technique)). Thus, if UEeNB selects the macrocell eNB as a serving cell, the utility function R which is a perceived reward (capacity) of the target cell is expressed as 1. Else if UEeNB selects to connect to one of the femtocells HeNBs in its NCL, the utility function R is expressed as 2 [26,27].
Let be the transmitted power by the macrocell eNB and ℎ , the gain of the channel between the macrocell eNB and its serving kth macrocell user UEeNB. Similarly, hi,j represents the gain of the channel between the ith femtocell HeNB and the jth femtocell user UEHeNB. Lastly, Pi represents the transmit power of the ith femtocell HeNB. An Additive White Gaussian Noise (AWGN) is considered at macrocell user UEeNB with 2 power. Macrocell user UEeNB k capacity from its serving macrocell eNB is calculated by (7): where is the available bandwidth, = ∑ |ℎ , | 2 =1 is the interference from neighboring femtocells HeNBs, and is the number of neighboring femtocells HeNBs. We consider that the bandwidth is equally allocated to all users (UEeNB and UEHeNB). The capacity at femtocell user j (UEHeNB) j from femtocell (HeNB) i is given by (8) is the interference from macrocell eNB, ℎ , is the gain of the channel between macrocell eNB and user j. Also, = ∑ |ℎ , | 2 ≠ is the interference from other femtocells HeNBs and ℎ , is the gain of the channel between HeNBl , transmitting with power , and user j.

RESULTS AND DISCUSSION
The LTE-Sim simulator [28] is used to evaluate the performance of the proposed algorithm depending on the number of the handovers with compare to the algorithm introduced by Suman [17]. The topology consists of two macrocells (eNB) with a radius of 1km each and various femtocells (HeNBs) density, the femtocell number is configured as 30, 50, 70 and 90 in each macrocell, and all femtocells are covered by open access type to allow the user equipment UE to handover to each femtocell. Each femtocell radius covers 30 meters. The UE number is configured as 15, 30, 45 and 60. The UEs are distributed randomly in each macrocell coverage area and each UE starts moving from the center of its serving eNB based on random mobility.
The handover decision in the proposed topology will cover three vertical handover types: Hand-in, Hand-between and Hand-out handovers based on the availability of each vertical handover type. Each femtocell will be randomly located between 50 meters to 1000 meters from the macrocell location in three dependent scenarios: close, middle and at the edge. Concerning femtocells distribution scenarios: close, middle and at the edge, femtocells are distributed in four different groups: 30, 50, 70 and 90 in each scenario. Figure 3 presents the average number of handovers for the proposed algorithm in each scenario for 30 UEs. As shown in Figure 3, the relationship between the average number of handovers and femtocells density is positive relationship, which means that the average number of handovers increase when femtocells density increase. While it has the lowest average when the femtocells distribution is at the edge. This is because the mobile users start to move from the location of macrocell tower. In addition, the average of handovers number increases as the number of femtocells in all distribution scenarios increases.
Furthermore, the results of the average number of handovers for the proposed algorithm and Suman handover algorithm were discussed in terms of femtocells that are distributed to groups of 30,50,70 and 90 per each macrocell, and two groups of UEs (15 and 30) as presented in Figure 4. Based on each result, it is evident that by increasing the femtocells number, both algorithms show an increment in the average handovers number, because mobile users make additional handovers with respect to their movements in each mobile user group.
The results emphasize that the best performance was achieved by our algorithm in all distributions of femtocells and all densities. This is because of utilizing Q-learning methodology which allow the mobile user to learn from his previous history, in addition to other supporting methodologies which do not allow the mobile user to connect to femtocells that are only close to the it, but to connect to those located in front of or approximately ahead of current mobile user position in order to avoid the redundant handover.
The user equipment only nominates the femtocell whose tower location is less than |±25| and the distance between the UE and the candidate femtocell is less than or equal 28 meters. On the contrary, in the case of Suman handover decision the handover procedure is triggered when the RSS between the UE and its neighbor femtocells is higher than the RSS between the UE and its serving cell without any consideration of how long the target femtocell will serve the UE and its usefulness to do handover or not. Finally, regarding the total average number of handovers for each UE group, it is reduced in the proposed algorithm by (55.63%) compared to Suman handover algorithm for all various femtocells densities when the number of UE is 15. Moreover, the proposed algorithm reduces the total average number of handovers by (41.74%) compared to Suman handover algorithm for all various femtocells densities when the UEs number is 30.

CONCLUSION AND FUTURE WORK
The simulation results show that the proposed algorithm performs well in enhancing the handover decision in LTE-A networks. The simulation results examined the proposed algorithm for femtocells of the open access type in order to enhance the target femtocell selection in the vertical handover decision. The selection of suitable parameters to improve the handover decision still encompasses a wide area research. Therefore, the recommendation for further research in this field can be as follows: Firstly, is to investigate different parameters of user performance in light of handover and load balancing in the wireless system over horizontal and vertical networks. Secondly, to investigate different parameters of user performance on both femtocell types: the close and hybrid. Finally, in regard to implementation, UE velocity should be taken into account in the handover decision as the main behavior. Thus, by monitoring the three main behaviors at UE which are the UE mobility, acceleration, and deceleration as the frequent line changes, the suitability of the proposed algorithm for the UE behavior can be ensured.