A comprehensive review on hybrid network traffic prediction model

ABSTRACT


INTRODUCTION
With the huge tide of the internet driving the rapid development of society, computer network has since become an important technical means of the information society. In order to guarantee the quality of network services such as video conferencing, online gaming and the like, the increment of network traffics is necessary [1,2]. Due to the large volume of traffic flow in and flow out in the network, data is being leaked or disclosed every day [3][4][5]. It is hard to detect the abnormalities as well as propose preventive remedies to minimize the security risks in advance [6,7]. Consequently, network failures caused by potential malicious intrusions and virus invasions have triggered serious concerns among the network management and monitoring team [8][9][10].
Network traffic is an important parameter to evaluate the running state of a network. It is found to be a nonlinear time series [11] which has the characteristics of time-variability, long-term correlation, selfsimilarity, suddenness and chaos [12]. Therefore, a more accurate and fast response traffic prediction model is much desired to ensure a safe and healthy network situation. According to Joshi [13], network traffic prediction is a reliable method to secure the network communication in a network management and monitoring system. It is a process which analyses the characteristics of traffic in the past and present, generates the rules of internal structure and then constructs a model to predict the characteristics and trends of future traffic.

1451
A hybrid model was first proposed by Bates and Granger who integrated the merits of the individual models [14]. However, Dickinson confirmed that the variance error of the hybrid predicting model is less than any of the single models [15]. In Yang et al. similarly contended that the prediction error of a hybrid model is lower than a single nonlinear model. Given that, it is feasible to use hybrid models to predict the network traffic [16]. Since the earlier works, a considerable amount of literature has been published on hybrid network prediction models. These studies generally applied the optimization and decomposition techniques in the hybrid model to produce a better prediction accuracy.
In this paper, the recent hybrid models are comprehensively reviewed. The paper is structured as follows. Section 2 reviews the application of optimization technique in the hybrid model while the utilization of decomposition techniques in the hybrid model will be investigated in the following section. The final part of the paper concludes with a summary.

OPTIMIZATION TECHNIQUES-BASED HYBRID MODEL
In order to better describe the traffic characteristics and improve the accuracy of the prediction model, researchers have recently combined various methods into a single prediction model and optimize it with different optimization techniques such as particular swarm optimization (PSO) [17], and quantum genetic (QG) [18,19] among others.

PSO-based hybrid model
The random determination of the input weights and hidden biases [20] of extreme learning machine (ELM) can lead to ill-condition problem, resulting in low prediction. In Fei Han selected PSO algorithm with simple principle, and proposed the APSO-ELM hybrid model to address the drawbacks of ELM. He adopted adaptive algorithm to optimize PSO which selects the input weights and hidden biases. Then he used moorepenrose (MP) generalized inverse to analytically determine the output weights of ELM. In order to obtain optimal parameters of ELM, the improved PSO optimizes the input weights and hidden biases. In this case, the model not only obtains the optimal root mean square error (RMSE), but it also obtains the optimal output weight norm. This directly solves the problems caused by "randomness" of ELM and consequently improves the accuracy of the prediction model. It should be noted that his paper will only focus on the parameter problem of ELM which will affect the accuracy of the model but will ignore the drawbacks of local optimal solution of PSO [21].
The PSO algorithm with simple principle and few parameters can shorten the training speed of neural network, which in turn improves the convergence speed of the model. Based on this idea, in Yi Yang, combined the three algorithm models and proposed a new hybrid method which she called SPLSSVM. She also used PSO to optimize the two parameters of least squares support vector machine (LSSVM). Then, based on the seasonal adjustment (SA) and LSSVM, she reduced the seasonal interference on the traffic components to predict network traffic [17]. But the hybrid model only considers the seasonal characteristics of traffic and also ignores the problem of local optimal solution of PSO.
Based on the same design idea, Weijie Zhang similarly utilised PSO algorithm with simple principle to optimize structural parameters of RBF neural network. By adjusting the inertia weight and learning factor to improve the global search ability in the global extremum search, he was able to solve and avoid local optimal solution of PSO. Then he optimized the four parameters of the RBF so as to obtain the accuracy of the prediction model [22]. Lamentably, in the process of obtaining global optimal solutions, RBF has too many parameters to be optimized, and this will increase the calculation scale, the training time and affect the convergence rate.
In the view that PSO algorithm has a simple principle but local optimal solution problem, He et al. in 2016, introduced Quantum non-gate to realize mutation operation. He used particle flight path information to dynamically update the status of quantum bit so as to avoid local optimization drawbacks of PSO. Then, he used IPSO to optimize the weight, width and center position parameters of radial basis function (RBF) network. He was able to realize optimized parameters neural network and establish self-adaptive PSO-RBF hybrid model. As such, the network traffic data with nonlinear characteristics can be predicted and the difficulty of prediction is reduced significantly [23]. Inevitably, he solved the problem of PSO, yet neglected the complex calculation principle of Quantum algorithm. Considering the difficulty of approaching the global optimal solution, the prediction accuracy is consequently affected.
As discussed above, these authors have used different methods to solve the local optimal solution problem of PSO. However, they have all ignored the salient fact that there are too many parameters of RBF optimization and the training time is too long to approximate the optimal solution. In effect, the optimal solution of PSO was not effectively solved, thus affecting the convergence speed and accuracy.

QG-based hybrid model
Considering that the limitation of traditional network traffic time series prediction model and the problem that back propagation (BP) neural network is easy to get into local solution. The limitations of traditional network traffic time series prediction model and the BP neural network have been widely acknowledged in the literature. In this light, in Kun Zhang introduced the Quantum algorithm with strong global optimization ability and the PSO algorithm with simple principle. He applied the combination algorithm to solve the gradient explosion problem of BP, and he then proposed the QPSO-BP hybrid model. To elaborate, when QPSO algorithm is applied to the training stage of prediction model, a set of weights that minimize the error function in competitive time can be obtained. The weight is updated gradually until the convergence criterion is satisfied. After that, the objective function to achieve the minimization is the prediction error function so as to improve the prediction accuracy of the model [24]. Inevitably, in the process of iterative training, the computation scale of Quantum algorithm is very heavy [25]. So, it is still difficult to approach the global optimization and obtain the local optimal solution.
In view of the shortage of BP neural network, the prediction error and jitter of the model are easily large. Hui Tian also used algorithm to optimize the structure of BP network based on efficient global search capability of quantum genetic algorithm (QGA). Then, he applied wavelet technology (WT) to decompose the traffic into low frequency and high frequency data. He subsequently proposed WT-QGA-BP hybrid model to predict the chaos of network traffic. However, the model ignored two important facts the wavelet technology has signal noise with the signal, and the Quantum mechanical calculation scale is too large. These issues will increase the complexity of the hybrid model, resulting in low generalization performance [26].
As the algorithm developed and evolved, researchers found that although the fruit fly optimization algorithm (FOA) can easily fall into the problem of local optimization solution, it still has the strengths of simple calculation and coding convenience. In Ying Han used Quantum mechanics theory to optimize FOA. Then he used QFOA to optimize five important parameters of echo state network (ESN) and proposed QFOA-ESN hybrid model to provide model accuracy. The model is best described as follows: first, the phase-space reconstruction technology is used to reconstruct the original network traffic data series; afterwards, the ESN method is used to build the prediction model. Meanwhile the model parameters are optimized by the QFOA. Finally, the optimal ESN model is used for multi-step prediction for the network traffic [27]. However, the model has too many optimization parameters and the computation scale of Quantum can become larger. Hence, in the process of training, these will affect the convergence rate and accuracy of the hybrid model. Both authors (i.e., names) evidently solved the limitations of neural network. Yet, they all ignored the fact that although the Quantum algorithm can solve the global optimization, the training time of the model due to the large amount of computation and complex calculation scale is increased. This makes it difficult to solve the optimal solution which will in turn affect the accuracy and convergence rate of the prediction model.

Other hybrid model
RBF neural network has the advantage of global approximation to nonlinear function. Hence, it can predict the network traffic data with nonlinear characteristics. Based on this, in Dengfeng Wei introduced the gravity search algorithm (GSA) to optimize the RBF network structure and improve the convergence rate of the prediction model. On one hand, the method can optimize the parameters such as the center ci of the basic functions of hidden units, width ri and network connection weights wkj of RBF. On the other, the fitting result and nonlinear approximation ability of RBF neural network are better used to obtain the optimal neural network prediction model. In the iteration process, RBF parameters are lamentably too many to be optimized so as to obtain local optimal solution problem [11].
Aiming at the gradient explosion of BP neural network and the local optimal solution of long shortterm memory (LSTM). In view of the gradient explosion of BP neural network and the local optimal solution of long short-term memory (LSTM), Azzouni, in 2017, proposed a LSTM-RNN (recurrent neural network, RNN) hybrid framework to predict traffic matrix. By validating the framework on real-world data from GEANT network, he used the sliding learning window method to solve the LSTM neural network limitations. Then he combined it with RNN neural network to extract the dynamic characteristics of network traffic and predict the future traffic. Although his work managed to solve the LSTM's limitation using the sliding learning window method, the total number of time slots became too large, resulting in high computational complexity [28].
In the same year, based on the same problem of LSTM neural network, Qinzheng Zhuo proposed a model of neural network which can be used to combine LSTM with deep neural networks (DNN). The aim was to solve the network traffic prediction of autocorrelation nonlinear time series data. Auto-correlation coefficient is then added to the model to improve the accuracy of the prediction model. This model boasts higher precision when compared to the other traditional models. After considering the autocorrelation  [29].
With the rapid development of technology and given the same drawbacks of LSTM which can result in some deviation of the prediction results, Duan in 2018, focused on filtering noise flow data to mitigate the deficiencies of LSTM. Based on the idea of decomposition, he proposed the seasonal loess trend decomposition (ST) and LSTM prediction model. The model process method is aimed at dealing with periodic traffic data, decomposing trend and eliminating random noise. But the hybrid model is limited in use in that it can only address the periodic characteristics of traffic, and not the nonlinear multi-scale characteristics of traffic data [30].
As is evident, different researchers focus on different optimization algorithms. To address the problem of "prematurity" of FOA algorithm, Han optimized the ESN neural network based on the Opposition-Based Learning mechanism. He also solved FOA's defect in order to realize multi-step prediction of network traffic. He first used the phase space reconstruction technique to reconstruct the original network flow time series and establish the model based on the ESN method. Then, using opposition-based learning mechanism of fruit flies optimization algorithm, he optimized the model parameters. Finally, the optimized model is used to realize the multi-step prediction of network traffic. However, there are four parameters of ESN which must be optimized, making it too large to approximate the optimal solution, thus affecting the accuracy [31].
In Wenquan Xu, posited that the existing traffic model should focus on finding parameters such as the weight of node connection in the neural network. If the appropriate value cannot be obtained, the model parameter search remains in the local optimal, hence resulting in a compromised model precision. Due to this, the author used auto-regressive (AR) model to fit the original data and obtain the AR model residuals between the original data and the predicted data of the AR model. The residuals are regarded as the nonlinear component and are taken as inputs into the deep belief network (DBN) model. The AR model prediction and the output of the DBN model are the final forecasting value for the time series. Inevitably, in the process of substantial trainings and residuals, the model has to be constantly adjusted by the coefficient, thus leading to an increase in the calculation scale and time [32]. For a better understanding of the application of the optimization technique in the hybrid model in terms of its strengths and limitations, a comparison is presented in Table 1 in the Appendix.
In a nutshell, researchers have used different optimization algorithms to construct traffic models which have higher performance. While they have sufficiently considered the drawbacks of single neural network, some limitations remain [33,34]. To build better accurate models, researchers are constantly trying out new techniques and methods. With the development of research, the idea based on decomposition is gradually introduced into the prediction field of traffic timing.

DECOMPOSITION TECHNIQUE-BASED HYBRID MODEL
In the new era of hybrid model construction, researchers introduce time-frequency analysis into traffic law analysis and apply the signal analysis theory to traffic time series analysis. Therefore, decomposition techniques are now widely used in hybrid models, mainly wavelet transform (WT) [35] and mode decomposition (MD) such as empirical mode decomposition (EMD) [36], ensemble empirical mode decomposition (EEMD) [37] and variational mode decomposition (VMD) [38]. These time series traffic hybrid models are fast becoming a hot topic for researchers.

WT-based hybrid model
The network traffic has the characteristics of remote dependence and multifractal, rendering the single neural network model an inadequate prediction tool. In Laisen Nie, introduced decomposition idea and used discrete wavelet transform (DWT) to divide the signal into low-pass and high-pass components. gaussian model (GM) predicted high-pass components and deep belief network (DBN) model predicted lowpass components, estimating the parameters of the Gaussian model by the maximum likelihood method. Then he predicted the high-pass component by DWT-DBN-GM hybrid model [39]. Based on the same notion, Laisen Nie also adopted the DWT method to decompose the signal. The author used spatiotemporal compressive sensing (SCS) method to predict high-pass components and DBN model to predict low-pass components. He subsequently proposed the DWT-DBN-SCS hybrid model [40]. Using this model, the defect of single neural network is solved, but the difficulty of decomposition scale of wavelet transform (WT) [41] is overlooked, which can then affect the accuracy of the hybrid model. sequence contains the trend and cyclical features of traffic, whereas the detail sequences contain the detailed information at multiscale. Then the approximation sequence is used to train the LSTM network, while the detail sequences are used to construct the empirical detail sequences. By reconstructing sequence with predicted approximation sequence and empirical detail sequences, the prediction of future traffic is acquired. However, the limitations of WT are similarly ignored and the problem of local optimal solution of LSTM remains unsolved [42].
Given that the decomposition scale of WT technique is difficult, Madan et al. used inverse discrete wavelet transform (iDWT) technology to decompose the traffic into details and approximate components. Through the iDWT reconstruction, the sequence is reconstructed to obtain a new time series. Then the selected ARIMA model is used to predict the low component, and the RNN neural network is used to predict the high component, respectively. The proposed hybrid model is a time series which can be used to predict the future traffic trends in a computer network. The model has sufficiently solved the harder problem of WT's decomposition scale. However, the complexity of calculation in RNN is ignored, hence the accuracy of prediction model can be questionable [43].

MD-based hybrid model
The constraints of WT technology and single neural network include the tendencies of falling into local minimum, and over fitting. The selection of network structure is also too dependent on experience. These limitations directly affect the reliability of neural networks for time series prediction and modelling. In Tian Zhongda first proposed to decompose traffic into stable data signals of different characteristic scales based on Empirical mode decomposition (EMD) technology. The components after decomposing remove the long correlation and the different yet prominent local characteristics of time series which can in turn reduce the non-stationary of time series. He then proposed the EMD-ELM hybrid model with the incorporation of the ELM neural network [44]. Unfortunately, EMD is prone to mode aliasing and endpoint effect problems [45] during the decomposition process which can eventually compromise the prediction accuracy.
In view of the traffic long and short correlation, Chen introduced the EMD-PSO-SVM hybrid model based on empirical mode decomposition, particle swarm optimization and support vector machine. First, the EMD s used to eliminate the influence of traffic noise signals. Then particle swarm optimization algorithm is used to optimize the parameters of SVM. The effectiveness of the presented method is examined by evaluating it with different methods including basic SVM and EMD. Finally, SVM is used for model training and fitting traffic model [46]. While this model can improve the accuracy of network traffic prediction, it ignores the model aliasing problem and endpoint effects of EMD; the definiteness of model prediction is subsequently affected.
To address the limitation of EMD, Wanwei Huang introduced ensemble empirical mode decomposition (EEMD) technology and quantum neural network algorithm to construct the QNN-EEMD hybrid model. The EEMD technique is used to decompose the time series into IMF to remove modal aliasing and redundancy. Then he used QNN to process the decomposed IMF and optimize the parameters of the model so that the convergence speed of the hybrid model is improved [47]. However, the model ignores the impact of too large computation scale of Quantum algorithm mechanics. In addition, the EEMD dependence on amplitude and number of experiences [48] will affect the accuracy of the prediction.
Due to the deficiencies of EMD and EEMD, Lina Pan argued that ESN can easily suffer from the influences of initial random weights. She first introduced the concept of variational mode decomposition (VMD) to overcome the problems of EMD and EEMD and effectively decompose the traffic. Using Bat Algorithm (BA) algorithm to improve and optimize ESN parameters, she then proposed the VMD-BA-ESN network traffic hybrid prediction model. In the process of the decomposition, VMD is utilized to decompose the original internet traffic series into several band-limited intrinsic mode functions (BLIMFs). Inevitably, decomposition layers will be an important factor to determine the accuracy of prediction [49].
Given the strong non-stationary and high complexity of the chaotic time series, it is difficult to directly analyse and predict by just depending on a single model. Hence, in Xinghan Xu applied a two-layer decomposition approach and optimized BP neural network. The hybrid model aims to obtain comprehensive information of the chaotic time series which is composed of complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and variational mode decomposition. The VMD algorithm is used for further decomposition of the high frequency subsequence obtained by CEEMDAN, after which the prediction performance is significantly improved. Then the BPNN optimized by a firefly algorithm (FA) is utilized for prediction. The hybrid model fully considers the importance of decomposition signals. However, it ignores the performance of VMD determined by the number of decomposition layers, which is likely to cause overdecomposition or under-decomposition and can affect the accuracy of the model [50].
In view of the extensive application of decomposition technique in network traffic prediction and attempts to improve the prediction accuracy of nonlinear non-stationary traffic data, Ying Han, et al., proposed a IFOA-ESN combined prediction model. This model is based on VMD by Levy flight function and cloud generator. First, VMD is used to decompose the original network traffic data into several subsets. Then, multiple sub reservoirs are built after performing the phase space reconstruction (PSR) of each data subset. Finally, the training set is used to train the prediction model. This mechanism solves the problem of VMD requiring a certain number of pre-set patterns and iteration factors, which cannot be determined by subjective experience. Unfortunately, the synchronous optimization inside and outside of the multiple sub reservoirs necessitates longer calculation time. This can negatively affect the training time of the model and the convergence speed as well as the performance of the model mechanism. Thus, the bigger calculation scale remains an unsolved scientific conundrum [51]. The strengths and limitations of the decomposition technique in the hybrid network traffic prediction model are summarized in Table 2 in the Appendix.

CONCLUSION
In conclusion, optimization and decomposition are two important processes in a hybrid network traffic prediction model in ensuring a higher prediction accuracy and faster convergence speed. This paper found that PSO as well as other optimization algorithm can generally identify network traffic time sequences better given its strengths of simple principle, small calculation scale, fast convergence speed and so forth. The paper also confirmed that the decomposition technique is an effective method to deal with non-linearity and non-stationarity of data as it provides a modelling idea based on the time frequency analysis for traffic analysis. Especially, VMD can overcome the multiresolution and decomposition scale problem in WT, solve the problem of mode aliasing and white noise amplitude in EMD and EEMD decomposition techniques. The review has, to some extent, helped enhance our understanding of the importance of optimization and decomposition techniques in a hybrid network prediction model. The parameter optimization of decomposition technique and optimization algorithm is the key process to determine the prediction accuracy and convergence rate. Future research should therefore concentrate on the investigation of how to simplify the optimization algorithm with fewer parameters, shorten the convergence speed and improve the decomposition effects to subsequently enhance the network traffic prediction accuracy.  [21] Particle swarm optimization APSO-ELM Optimizes input weight and deviation of ELM based on the adaptive PSO algorithm.

APPENDIX
The problem of local optimal solution of PSO is still not solved and may affect the accuracy of the model.
Extreme learning machine 2 2013 Kun Zhang, et al., [24] Particle swarm optimization QPSO-BP Solves BP gradient explosion based on the Quantum algorithm and PSO.
The quantum algorithm is difficult to calculate; l easy to fall into the local optimal solution.
Quantum BP neural network 3 2014 Yi Yang, et al., [17] Seasonal transform SA-PSO-LSSVM Sequence elimination by SA reduces the interference of seasons on components and optimizes two parameters of LSSVM based on the PSO.
Only considers the seasonal characteristics of traffic but ignores the PSO local optimal solution.
Particle swarm optimization Least squares support vector machine 4 2016 Deng Feng Wei [11] Gravity search algorithm IGSA-RBF Improves the speed selection formula based on the GSA and optimizes three parameters of RBF.
GSA lacks theoretical guidance; RBF optimization parameters are too many; easy to fall into the local optimal solution problem. The optimization parameters are too many; the calculation is large, the structure is complex will in turn affect the convergence speed.
Fruit fly optimization Algorithm 10 2018 Weijie Zhang, et al., [22] Radial basis function IPSO-RBF Optimizes PSO by using mutation method to avoid local minimum problem and then optimizes three parameters of RBF.
It is difficult to approach the global optimal solution due to too many parameters when using IPSO to optimize RBF.   [47] Ensemble empirical mode decomposition QNN-EEMD Decomposition time series into IMF by EEMD so as to remove modal aliasing and optimize parameters of QNN model to avoid local optimal solution.
Focuses on solving the problem of signal decomposition, but how to optimize the EEMD dependent amplitude and experience is still a challenge Artificial neural network Quantun 8 2019 Wenbo Chen, et al. [46] Empirical mode decomposition EMD-PSO-SVM Eliminates noise from data based on the EMD, and optimizes SVM based on the PSO.
The influence of modal aliasing in EMD on model accuracy is ignored. Support vector machine Particle swarm optimization 9 2019 Xinghan Xu, et al., [50] BP neural network CEEMD-VMD-FA-BPNN Improves EEMD defects by increasing adaptive white noise amplitude and forms a two-stage decomposition with VMD technology; optimizes the threshold and weight of BP neural node based on FA to improve the ability of function approximation to neural network.
Ignores the number of VMD decomposition layers which can influence the decomposition effect, thus reducing the prediction accuracy of the model.