Efficiency of recurrent neural networks for seasonal trended time series modelling

ABSTRACT


INTRODUCTION
The analysis of time series represents a source of knowledge and information given the amount of data generated through technical and technological development, which multiplies the fields of application for this discipline.In the field of time series, researchers tend to propose models describing the underlying relationship of the generator process and to forecast time series [1].The seasonal and trend components are characteristics of several time series resulting from economic phenomena.Seasonality is considered as a periodic and recurring pattern, while the trend component characterizes the long-term evolution of the time series studied.The importance of accurate forecasting of seasonal time series trends is crucial for areas such as marketing, inventory control and many other business sectors.
The traditional methods of time series analysis proceed with two main steps: decomposition, then reconstitution of series to carry out the forecast [2].This approach assumes that the structure of time series can be decomposed into modellable elements [3].There are three main components: the trend Tt, which describes the long-term evolution and the phenomenon's pattern, the seasonal component St, which characterizes repetition over time, and the residual component Rt, which represent the noise [4].
In the 1970s, Box and Jenkins introduced another perspective on time series modelling [5], named Box and Jenkins methodology, it is based on the Wold's representation theorem [6]- [9]; in fact, once a process is (weakly) stationary, it can be written as the weighted sum of past shocks.This is how the notion of stationary becomes fundamental to the analysis process [10].However, a seasonal and trend time series is considered to be non-stationary and often needs to be made stationary, using a certain seasonal adjustment method [11], before most modelling and forecasting processes take place.

6587
Moreover, neural networks (NN) offer new perspectives [12]- [14] for modelling time series than traditional seasonal autoregressive integrated moving average (SARIMA) models [15], [16].The learning mechanism allows to establish a neural architecture based on parameters such as the size of the input vector, and the number of hidden layers.Indeed, NN have been widely applied to many fields through their flexibility to design a network structure [17].The fully connected NN (FNN) is a basic structure of neural networks, Qi and Zhang [18] implemented this structure to seasonal time series with trends, indeed, they conducted experiments by comparing the two models, autoregressive integrated moving average (ARIMA) and FFN, which report that an FNN cannot directly model seasonality, however, a preprocessing step is needed involving seasonal and trend adjustments for proper modelling.Liu et al. [19] also compares FFN and ARIMA using the same type of simulated time series, this study concludes that by choosing rectified linear unit (ReLU) or the linear activation function and Adam optimizer, the FFN model performs well.
The motivation for this works was inspired by Qi and Liu studies [18], [19], in which the authors compare the performance of SARIMA to an FNN and a convolutional NN.In this paper, we plan to use a recurrent neural network (RNN), in particular, long short-term memory (LSTM) and bidirectional long shortterm memory (Bi-LSTM) extensions.However, their experiments and conclusions are inadequate for our purposes.
The aim of our study is to find a modelling method such that users do not have to worry about preprocessing time series.Thus, the initial motivation of this paper is to develop a machine learning tool to predict time series data without manual intervention, using recurrent neural networks.The main problem is to find a general-purpose modelling method or algorithm that can handle seasonality, trends and auto-correlations in time series data.It is important to note that the initial question was about the choice of these parameters, in particular the size of the input vector and the number of hidden layers for additive and multiplicative signals.

METHOD 2.1. Principle of LSTM and Bi-LSTM structures
Extensions of recurrent neural networks (RNNs) such as LSTMs are the most feasible solutions since, they are directed to the problem of the gradient disappearance by managing short-and long-term memory.They anticipate future predictions based on various highlighted characteristics present in the dataset.LSTMs can remember or forget things precisely.Data collected on progressive timescales is presented as time series, and let to make predictions, while LSTMs are proposed as a stable methodology.In this type of design, the model passes the past protection state to the next stage of the layout.Since RNNs can only store a limited amount of information for long-term memory storage, LSTMs cells are used with RNNs [20].They overcome the difficulties of leakage gradient and explosion gradient and have the ability to support long-term dependencies by replacing the hidden layers of RNN with memory cells.The LSTM block contains three gates [21] and each gate corresponds to a processing step.Standard recurrent neural architectures, like LSTM, treat the inputs in one direction only and ignore the possessed information about the future.The bi-directional LSTM (Bi-LSTM) model responds to this issue in its operating process [22].
For the Bi-LSTM topology [23]- [26], the information flows in two directions as illustrated in Figure 1, taking into account the temporal dynamics of the vectors of past and future inputs.Standard RNN's hidden neurons are split forward and backward.The basic structure of Bi-LSTM [27] is unfolded in threetime steps: forward pass, backward pass, and weight update.

Processing strategy
The first questions at the origin of this study were mainly related to the capacity of RNNs to model the regularities of a signal, specifically the seasonality and the trend.Then by developing a neural model, we realized that several parameters are put into the equation, namely, the size of the input vector, and the number of hidden layers.This work empirically highlights a correlation between the period of the time series and the size of the input vector for stable and relatively successful learning.This study is conducted on twenty-six time series derived from a real phenomenon, the first characterizes the evolution of the number of passengers in an international airport, which is a classic signal in the literature [4], it was the first signal for which the two researchers George Box and Gwilym Jenkins established their methodology, then the second characterizes the evolution of CO2 concentrations in the air measured during 1974 through 1987 [28].In another sense, it is important to note that the time series belong to two basic classes of models, namely the additive and the multiplicative models, in order to analyze the robustness of the RNNs not only to the change of fluctuation at the seasonal level but also to the impact of white the noise and then to draw conclusions on the stability of the established systems.
It is important to note that the learning process will be carried out by a part of the signal noted (Train), which represents nearly 80% of the size of the basic signal.However, 20% will represent the part (Test), which allow us to measure the performance of the learning carried out via the mean absolute percentage error (MAPE) given in (1).The next step is to make a prediction of 100 future observations in order to analyze the prediction of the system and its ability to detect the regularities of the signal and under what conditions, and if the system has taken into account the regularities of the signal (Tt and St).We apply the low-frequency filter (the moving average) by changing the window l to determine the period of the predicted signal.In Figure 2, we have displayed the whole layout of the proposed model.) × 100 (1) Figure 2. Layout of proposed method

Simulated data
The design of the methodology of this empirical analysis focuses on the use of several time series with different periods and variance  2 of white noise, we generate the time series via (2) and (3).

𝑦(𝑡) = 𝑆𝐼(𝑡) + 𝑇(𝑡) + 𝐸(𝑡)
(2) In ( 2) and ( 3) are a characteristic of the additive (AM) and multiplicative (MM) model respectively, such that the SI(t) is the seasonality index, Table 1 shows the measures adopted for MM and MA.T(t) is the linear trend, and E(t) is the distribution error that follows the normal distribution (0,  2 ).Note that for each given SI we assign  2 three values =1, 5, 12. Indeed, controlling the seasonality index allows us to fix the period of seasonality and then see the reaction of the established systems with respect to these changes.On the other hand, the change  allows us to test the robustness of the established assumptions with respect to the increase of the white noise energy in the signal.Figure 3 shows an example of the time series that we have generated with a seasonality index SI given in Table 1, for MM in Figure 3(a) and for an AM in Figure 3(b), the part T that characterizes the trend is given by T(t)=0.8t+150 for any  ∈ [0.359], i.e., this series as well as all the others generated from both MM and AM will have 360 observations.The white noise variance  2 of this time series is given by (

Modeling strategy
The databases we manipulated in this study are univariate time series.We implemented two models of recurrent neural networks in particular LSTM and Bi-LSTM, using libraries such as NumPy [29], Pandas [30], Keras and TensorFlow [31].For a given signal with fixed period p and white noise variance  2 , we performed in the learning by train part while varying the size of the input vector  and took the following values: 3, 4, 9 and 12.In other words, we have performed for a given signal four tests, this allows us to note the correlations of the different parameters of the system.Figure 5 shows the neural structure adapted for the two models, LSTM and Bi-LSTM.Noting that the structure is the same for both models, it is composed of an input layer with  inputs and connected to 256 neurons of the first hidden layer.The neural structure has six hidden layers, the choice of the number of neurons is based on the remark of Moolayil [32], concerning the number of hidden layers, which is one of the questions of this study.How to choose the number of hidden layers intelligently in relation to the particularity of the signal to guarantee the performance of the learning.We conducted experiments in this direction, but they did not lead to consistent results.  2 shows the different parameters of the neural architectures established.For the RNNs model LSTM and its extension Bi-LSTM, the activation function is the ReLU, the optimization algorithm held is the Adam's and the cost function employed is the mean-squared error (MSE).The learning algorithm tries to minimize the cost function, which characterizes the distance between real and predicted values at the same time, by adjusting the weights and bias of the system.The starting point of these parameters impacts the learning performance, a decision is made to initialize the weights and bias of the neural architecture to the same values for all performed tests.

Simulated data results and discussion
We established LSTM and Bi-LSTM to try to answer the question reported in section 2. We adopted the same parameters indicated in Table 2 of section 2 for all the neural architectures established.The purpose of this study does not take into consideration the comparison of the different architectures such as Adam algorithm, adaptive gradient algorithm (AdaGrad) and stochastic gradient descent (SGD) or the different existing cost functions.Table 3 shows the results of the tests carried out using the methodology reported in section 2 to generate the time series, according to AM and MM characterized respectively by the formulas 2 and 3. We raise two remarks: firstly, there is a correlation between the period of the signal and the size of the input vector , meaning that, to guarantee the relative performance of the learning, it is more appropriate to choose  = , and this is for the two extensions of the recurrent neural networks LSTM and Bi-LSTM.Secondly, the white noise impacts the learning performance.
Figure 6 shows the learning result for AM, Figures 6(a) and 6(b) illustrate the performance of the LSTM and Bi-LSTM models, respectively.The MAPE, as shown in Table 3, is of order 0.53 and 0.49, respectively.Figure 7 characterizes the learning results for MM, Figures 7(a) and 7(b) illustrate the performance of the LSTM and Bi-LSTM models, respectively.The MAPE, as shown in Table 3, is of the order 0.17 and 0.09 respectively.
Remember that the comparison of the two models LSTM and Bi-LSTM is not the goal of this work.The initial question was mainly focused on the stability of learning, and to ensure an adequate model for signals, characterized by a seasonal and a trend via RNNs models.Two interesting results are deduced: first, a significant correlation exists between the size of the input vector of the system  and the period of the signal, second, the noise has an impact on the learning.

Real data results and discussion
We used the same parameters used previously as shown in Table 2 and the same RNNs on the two-time series, both additive and multiplicative, as described in subsection 3.3.We applied the low frequency filter (the moving average) to determine the period, we conduct tests by systematically changing the size of the input vector.The results of the system learning for the real data shows the potential of neural networks to model this class of time series by choosing appropriate parameters, in particular the size of the input vector, Figure 8(a) illustrates the performance of the LSTM models, the MAPE is of order 0.04, while Figure 8(b) shows the prediction of the two neural systems for 100 future observations.Moreover, Figure 9(a) shows the performance of the Bi-LSTM model, the MAPE is of order 0.05 while Figure 9 It is important to note that the learning performance depends on the size of the input vector ve, which corroborates the conclusion made for the simulated data, indeed, we did the learning by varying ve, for ve=12 the system becomes efficient compared to other  values.The first multiplicative signal of the monthly evolution of passengers has period p=12 [5], for ve equal to 3, 6 and 9, MAPE is of the order, 12.35, 4.21 and 13.93 respectively, and this for LSTM model, the choice of ve=12 allows an optimal performance of the order MAPE=0.04.The second additive signal has period p=12 [28], for  equal to 3, 6, 9 and 12, MAPE is of the order, 10.26, 6.74, 8.31, and 0.05 respectively, and this for Bi-LSTM model.The prediction of 100 future observations shows clearly that the system was able to learn the different features for the multiplicative and additive signals, such as the variation of seasonal component fluctuations over time.
Liu et al. study [19], adopts models such as, the convolutional neural network (CNN), FNN and a non-pooling CNN.Lui's study [19] also made a comparative study on the optimizer parameter using several types such as, Adadelta, AdaGrad, Adam, and SGD as well as the activation function namely, ReLU, Tanh, linear.They concluded that the choice of system parameters impacts learning performance, by indicating that choosing ReLU or linear activation functions and the Adam optimizer increases performance.Concerning this paper, we focused the research on other parameters of the system, specifically, the size of the input vector using RNN models.

6593
To evaluate the performance of the neural network system, we made a comparison with the autoregressive moving average (ARMA) model, ARMA requires additional preprocessing to make both time series stationary.We created a 12-lag difference to remove the seasonality and then a 2-lag difference to remove the trends.The final model for the total passengers in an international airport data is ARMA with p=12 and q=1, which were selected by the autocorrelation function, the partial autocorrelation function and the Bayesian information criterion (BIC).For this model, the MAPE measure is of order 1.39.We find that the LSTM has obtained much better MAPE value than the ARMA.

CONCLUSION
Analyzing and modelling time series allows the extraction of knowledge.In the present study, we have introduced the modelling of seasonal time series with a trend via a supervised learning technique, in particular the RNN method.For this, we have established both LSTM and Bi-LSTM models in to propose an approach to construct neural systems allowing relatively efficient modelling.We conducted tests on real and simulated time series, and we simulated the additive and multiplicative classes, in order to test the ability of the established systems to detect the change in the fluctuation of the seasonal component over time.
Based on 80% of the data, the two RNN extensions were able to predict the rest of the series, which was then validated with the remaining 20%.Tests are performed by varying the period p and  2 (the variance of the noise component), and we noted a significant correlation between the input vector size ve and the period p.Indeed, to ensure relatively efficient learning, we recommend choosing the input vector size ve equal to the signal period p.We have also concluded that noise has an impact on learning performance, as the increase of MAPE error depends on the noise component.
Int J Elec & Comp Eng ISSN: 2088-8708  Efficiency of recurrent neural networks for seasonal trended time series modelling (Rida El Abassi)

Figure 3 .Figure 4 .
Figure 3. Plot the simulated time series (a) through the multiplicative model and (b) through the additive model

Figure 6 .Figure 7 .
Figure 6.Predicted values versus true values on the training data ( = 12) for AM, in (a) the LSTM model results and (b) the Bi-LSTM model results

Figure 8 .Figure 9 .
Figure 8. LSTM model results for total passengers in an international airport data (a) predicted values versus true values on the training data and (b) prediction of 100 future observations

Table 2 .
Parameter settings of the studied approaches

Table 3 .
Simulation result for neural networks ISSN: 2088-8708  Efficiency of recurrent neural networks for seasonal trended time series modelling (Rida El Abassi) 6591