Smoothing-aided long-short term memory neural network-based LTE network traffic forecasting

ABSTRACT


INTRODUCTION
Internet traffic has grown tremendously in the past decade due to vast network deployments in various domains and new emerging technologies with application-centric services. This growth was primarily driven by the surge in mobile traffic resulting from the substantial increase in mobile users and the emergence of new data-greedy technologies such as 5G, cloud services, the internet of things (IoT), and artificial intelligence (AI) based applications. Consequently, this has created a scalability issue for the service providers because network elements and resources must scale up spontaneously and dynamically to cope with the rising network resource demands.
Moreover, network management has become a complex task due to strong dependencies between the different service layers at which congestion may develop and spread horizontally and vertically. Network congestion caused by poor management will eventually affect the quality of service (QoS) and service levels. Therefore, proactive approaches to bandwidth and network resource management have become crucial. Such approaches would necessarily entail forecasting future network demands and planning accordingly to ensure a dynamic and timely reactive response. Hence, the accuracy of the predictive approaches has become a significant factor to apply predictive frameworks in production environments.
Preprocessing has also become crucial in data science, signal processing, and machine learning (ML) because of self-similarity, strong long-range dependence, and burstiness of network traffic. Data preprocessing is also important to address the issue of incomplete, inconsistent (containing errors or outlier values), and varying noise patterns existing and embedded in collected data, which eventually lead to service providers being incapable of satisfying the minimum QoS requirements of dynamic bandwidth allocation. Therefore, preprocessing is required before applying network forecasting techniques to enhance data quality [1]- [5].
The significance of noise processing or removal has been addressed previously in [6]- [8] where the authors performed various time series breakdown using techniques such as loess, Hilbert Huang transformation (HHT), and wavelet processing to eliminate data fluctuation, seasonality and noise components. However, noise removal must be performed carefully after assessing the statistical properties of the data under investigation because the process itself can eliminate a significant portion of the data itself. Generally, two broad approaches based on different models have been used for developing bandwidth forecasting algorithms: i) statistical analysis models and ii) supervised ML models. Statistical analysis models are based on the generalized autoregressive integrated moving average (ARIMA) model, while most traffic forecasting models are based on supervised ML, in particular, artificial neural networks (ANNs) [1], [2]. ARIMA-based models fall short when dealing with nonlinear and non-stationary data as ARIMA requires a stationary property to be imposed [6]- [16] unlike ANN. This paper presents the results of the study on the effect of hybrid long-term short-term memory (LSTM) neural network and local smoothing techniques.

RELATED WORK
The literature review reveals that several models have been applied for time series analysis and bandwidth forecasting. An analysis of the performance of various machine learning techniques for forecast performance assessment of video over the internet, techniques such as neural network, decision trees, and support vector machine (SVM) was undertaken in [17]. The study concluded that modeling through time series method is more suitable and produces more stable results. Also, the study revealed that ANN outperformed other discussed machine learning techniques used as benchmarks.
Alawe et al. [18] proposed a novel mechanism to scale out the access management functions (AMF) in a 5G virtualized environment. The mechanism, which is based on forecasting mobile traffic using LSTM neural network to estimate the user attach request rate, makes it possible to predict the exact number of AMF instances required to process the upcoming user traffic. By being proactive, the proposed solution allows the deployment latency, which may degrade network performance, to be avoided when scaling up resources. Simulation results confirmed the efficiency of the LSTM-based solution compared with a threshold-based solution. The proposed approach applied the LSTM directly on the request rate data without any preprocessing, which may lead to forecast accuracy degradation as discussed and demonstrated in [7], [8].
Dyllon et al. [19] developed a nonlinear autoregressive exogenous neural (NARX) network model for time series network traffic analysis. The study implemented a neural network model to predict the future trends of the London South Bank University (LSBU) bandwidth data traffic. Dataset was collected using the paessler router traffic grapher (PRTG) tool. The results showed that NARX neural network is a good method for predicting time series data.
In another study, Dalgkitsis et al. [20] compared LSTM performance for 4G traffic forecast against seasonal ARIMA (SARIMA) and support vector regression (SVR). The dataset was collected for 122 days and divided into two subsets: training and testing. The study found that LSTM performance was superior to SARIMA and SVR.
A deep traffic predictor (DeepTP) model to forecast long-period cellular network traffic was proposed in [21]. The results showed that the DeepTP model outperformed other traffic forecast models by more than 12.3%. The proposed approach applied a neural network directly to the dataset without any preprocessing. This may lead to forecast performance degradation due to data inconsistency, burstiness of the network traffic, and noise fluctuations, affecting QoS, network management, and security. In this study, we extend the reported research by studying the effects of the hybrid LSTM neural network and local smoothing techniques.
To resolve noise issues through preprocessing before applying traffic and other time series analysis, Yoo and Sim [6] proposed a hybrid Loess-ARIMA-based forecast model. Authors claimed that such a model has the potential of enhancing the efficiency of resource utilization, especially in high-speed networks, to accommodate the rapid increase in rising demands for scientific data applications. A seasonal decomposition of time series by Loess (STL) and ARIMA was applied on simple network management protocol (SNMP). The results revealed that the proposed forecast model was resilient against abrupt changes in network usage provided that the multistep forecast was used as the primary scenario. In [7], the significance of the interference-less machine learning approach in a time series forecast as a crucial component of prediction performance was discussed, especially when forecasting multiple steps ahead. Afolabi et al. [7] used HHT as the noise elimination technique. The results were then compared with conventional and state-of-the-art approaches. In their study, Joo and Kim [8] discussed a wavelet-based prediction method to analyze the time series in the time and frequency domains. The study presented several scenarios. The results concluded that the proposed method outperformed the other approaches. However, based on our analysis, because the proposed hybrid models are static and do not react to the dynamic nature of traffic loads since the underlying defined functions work on local scales, it would be difficult to capture the rapidly varying noise fluctuations. Moreover, some techniques such as the wavelet transform method can aggressively eliminate parts of the original data if not implemented carefully.

METHOD
The proposed research method is depicted in Figure 1. Data used was collected from a premier internet service provider (ISP) representing LTE (4G) aggregated bandwidth slice. The dataset is divided into two subsets. The first part represents the training dataset that was used to train the forecasting models, while the second subset was used for testing purposes. LSTM neural network was used as the forecasting technique. LSTM neural network was chosen in this study because of its superiority in finding correlations between current and previous states due to its unique cell-based structure, unlike other shallow/deep multilayer perceptron networks. Therefore, it is considered one of the most suitable time series candidates despite noise smoothing not being part of its feature extraction capability. The bandwidth slice as a time series data contains an inherent temporal correlation (non-zero temporal autocorrelation). LSTM is superior for learning temporal dependencies in sequential data [22]. The effectiveness of the LSTM neural network for resource forecasting in communication networks has already been analyzed in [18], [20]- [23]. In this work two forecast time scales were used, one day and one week. A hybrid forecasting model that combines LSTM neural network and various local smoothing techniques is used to enhance forecast accuracy. Local smoothing techniques allow the removal of noise and fluctuation in short scales. Compared with other wavelet-based techniques, local smoothing techniques react more dynamically to noise level and short-term variations due to flexible window sizes that can be applied throughout the dataset. A similar approach has been utilized in [24] whereby the Li et al. investigated the superior nonlinear approximation capacity of using SVM compared with "classical" local smoothing techniques such as moving average, Gaussian smoothing, and Savitzky-Golay filter. The results showed that the proposed model outperformed the state-of-the-art model, i.e., logistic regression. The effectiveness of the proposed method was verified through available real network traffic datasets.

Moving average
Generally, moving averages are usually calculated to identify trends. As discussed in the related work section above, ARIMA has been used extensively in time series and network traffic forecast. Moving average is a type of real-time filter that removes high frequencies from data. In signal processing, moving averages are also called "low pass filters" [25]. These filters have calculated coefficients that are equal to the reciprocal of the span or bandwidth.
Moving average is also known as "exponential smoothing". Let be defined as throughput at the time i. Let ={ }, =1 …. be the time series where p is the time series length. Therefore, the moving average of the period q at the time l can be calculated as depicted in (1) [25]. The computational complexity as shown in (1).

Local regression technique (Loess)
The Loess method [26] is based on fitting simple models to localized data subsets to form a curve that approximates the original data. Loess can be used as the sole forecasting method for traffic modeling in  [27] or as a preprocessing technique for seasonal decomposition in time series forecasting with ARIMA [13]. However, it is not suitable for long-range forecast due to the window size optimal selection dilemma as explained below. The observations ( , ) are assigned neighborhood weights using the tricube weight function shown in (6). Let ∆ ( ) = | − | be the distance from to and let ∆ ( ) be these distances ordered from smallest to largest. Then, the neighborhood weight for the observation , is defined by the function ( ).
For such that ∆ ( ) < ∆ ( ) where q is called bandwidth and represents the number of observations in the subset of data localized around x. In the proposed algorithm, this approach is applied to fit a trend polynomial to the last k observations of the resource utilization. Accordingly, for each new observation, a new trend line ( ) = + ( ) is found. This trend line is used to estimate the next observation ( + 1). The new observation can be considered as bandwidth slice utilization [27].

Smoothing windows selection
The smoothing window in the smoothing process is represented by q. If q is selected to be small, insufficient data will fall within the smoothing window and as a result, a noisy fit will be produced. On the other hand, if q is selected to be large, a substantial amount of data will be eliminated. Therefore, q is selected to produce the least mean squared error (MSE). For moving average, q is found to be 0.003, and MSE is 2.4155e+07. As for Loess, q is found to be 0.002, and MSE is 6.4096e+04. For comparison purposes, wavelet decomposition, also performed at level one with MSE, was found to be 2.155e+7. Table 1 illustrates the steps for the local smoothing process. The algorithm computation complexity is ( 2), where the computational complexity of Loess and moving average is ( ).

LONG-SHORT TERM MEMORY (LSTM)
LSTM is a neural network with modified structural components and is composed of chained units called cells, which are the most basic units of LSTM [28], [29]. Each cell is made up of three gates, namely an input gate, an output gate, and a forget gate. The function of the input gate is to save or memorize the current state; the output gate is used for the output; the forget gate is used to dismiss some information from the past. The relationships between the gates are sigmoid, dot product, and tanh functions. Figure 2 depicts the architecture of an LSTM cell and the functional relationships between the different gates. In Figure 2, represents the forget gate, represents the input gate and is the output gate. and ℎ represent the input and the output respectively while and are weights and bias, respectively. Similar to the approach used in [23], hyperparameter selection was made by grid search, as depicted in To enhance forecast accuracy, the hybrid LSTM and local smoothing forecast model is proposed. The mathematical representation of the hybrid LSTM and moving average model can be represented as (3) and (4).
The mathematical expressions of the hybrid LSTM and Loess model may be represented as (5) and (6).

RESULTS AND DISCUSSION
To enhance the time series forecast models, the augmented dicky-fuller (ADF) [31] test was used to confirm the stationarity of the time series while LSTM was used to model a non-stationary time series, as mentioned in the method section. Previous studies have recommended examining the stationarity of regression models as stationarity could lead to misleading results [32]. Real-time live trace was used as a dataset and modeled as a time series problem. Loess and moving average were applied successfully as a preprocessing technique with minimum smoothing window q. The higher the q values, the better the smoothing and the larger the amount of data loss. Certainly, in today's data-centric world, losing even small amounts of data could lead to QoS degradation. Also, the dataset has been divided equally into two sub-sets: 50% for the training and 50% for the testing with cross-validation, with Adam as the learning optimizer. Figures 3(a) to 3(d) shows the performance comparison of the hybrid LSTM-moving average, LSTM-Loess, and LSTM-wavelet for the 50-time step ahead forecast and the 300-time step ahead forecast, with the LSTM-wavelet, used as a benchmark. In addition to that, to validate the effectiveness of our hybrid model, well-known hybrid techniques such as SARIMA with Loess, moving average, and wavelet were used as benchmarks as well. Results show that that the hybrid LSTM techniques outperform other non-hybrid techniques. The performance-improvement percentages were 79%, 71%, and 70% for the 50-time steps ahead forecast; and 76%, 80%, and 77% for the 300-time steps ahead forecast. Overall, it is evident that the moving average-based hybrid forecast technique outperforms other techniques due to the high smoothing capability that moving average can provide. Taking into consideration Algorithm 1 is used to minimize unnecessary data loss, the moving average-based hybrid forecast technique produced the highest smoothing MSE of 2.4155e+07, compared with 6.4096e+04 for Loess and 2.155e+7 for wavelet smoothing. An explanation for this is that the smoothing window q in moving average has the greatest impact on the smoothing process, as depicted in (1) where q is constant for all . This can be contrasted with Loess where q is embedded in weighing tricube functions that reduce the direct significant impact on the smoothing process, as a result of which less smoothing will be applied, as depicted in (2). In contrast, the wavelet-based hybrid smoothing technique is influenced more by the selected mother wavelet coefficients than the smoothing windows.
The Diebold-Mariano test [33] was then applied to check the statistical significance of the obtained results. The findings prove the superiority of the LSTM hybrid techniques. Based on the RMSE results, LSTM hybrid techniques are better and statistically different from other SARIMA hybrid techniques. These findings coincide with the previous findings of [23] except that in that study, SVM was used. Moreover, this results is considered as enhancement of the previous work in [6], [8], [18]. Figures 4(a) to 4(b) shows LTE bandwidth forecast using LSTM-Loess for 50-time steps ahead and 300-time steps ahead. As can be seen from Figure 4, our model is consistent with the actual series. The results confirm the hybrid LSTM's capability to capture seasonality and data fluctuations with minimal data loss (minimum MSE) arising from the smoothing process.

CONCLUSION
In this paper, hybrid local smoothing and LSTM modeling approaches were used to forecast LTE bandwidth slice utilization. Three local smoothing techniques, namely moving average, Loess and wavelet, were investigated in this study. The results reveal that the hybrid LSTM and moving average model showed better forecast performance in terms of RMSE. The significant performance advantages are attributed mainly to the prepositive smoothing mechanism controlled by the time window, which can eventually alleviate the high variability of local noise patterns. The results were verified by statistical significance tests and by undertaking a comparison with other similar state-of-the-art approaches. We believe that the method proposed can be used for slice traffic forecast in 4G/5G slice resource management. Further, this work can be extended to be applied in the automatic resource allocation algorithm as a part of slice allocator or orchestrator in 5G and beyond.

Mohammad Rava
is a researcher and software engineer with a specialization in machine learning and metaheuristic algorithms. He completed his PhD in Software Engineering in Universiti Teknologi Malaysia. His current research interests are machine learning solution, metaheuristic algorithms and pattern recognition and analysis. He can be contacted at email: morava.ir@gmail.com.