A short-term hybrid forecasting model for time series electrical-load data using random forest and bidirectional long short-term memory

Received Apr 17, 2020 Revised Aug 11, 2020 Accepted Aug 28, 2020 In the presence of the deregulated electric industry, load forecasting is more demanded than ever to ensure the execution of applications such as energy generation, pricing decisions, resource procurement, and infrastructure development. This paper presents a hybrid machine learning model for short-term load forecasting (STLF) by applying random forest and bidirectional long short-term memory to acquire the benefits of both methods. In the experimental evaluation, we used a Bangladeshi electricity consumption dataset of 36 months. The paper provides a comparative study between the proposed hybrid model and state-of-art models using performance metrics, loss analysis, and prediction plotting. Empirical results demonstrate that the hybrid model shows better performance than the standard long short-term memory and the bidirectional long short-term memory models by exhibiting more accurate forecast results.


INTRODUCTION
Short term load forecasting (STLF) is a crucial instrument for ensuring the supply of high-quality electricity to customers in a systematic economic manner. Forecasts of electricity serve as the basis of many operating decisions taken by the electric utilities such as resource acquisition, dispatch scheduling of generating capacity, plant maintenance schedule, reliability analysis, and efficient power plant design [1]. The necessity of reliable load forecasting is more significant than ever as most of the energy markets around the world are becoming deregulated [2]. In deregulated electricity markets, energy transactions and price of electricity are critically conditional on forecast electric loads. The operational cost may rise remarkably as an outcome of underestimating the energy demand [3].
Load forecasting is a multivariate time-series prediction problem and generally divided into three categories: short-term, medium, and long-term forecasts. The STLF serves as the base for the safe and economic operation of power systems by providing a valuable estimation of the load for the next half hour up to the next two weeks. In earlier, several forecasting models such as statistical analysis, time-series analysis, Box Jenkins [4], and regression analysis were used to construct noteworthy forecasting results. The regression-based approach is one of the earliest and widely used statistical techniques. In the statistical methods such as holt-winter exponential smoothing (ES) [5] and auto-regressive integrated moving average  [6], we identify the dominant variables based on the correlation analysis with the load for forecasting. As statistical approaches struggled to achieve efficient results with the highly non-linear load data, researchers gradually moved towards the machine learning (ML), artificial intelligence (AI), and a hybrid of statistical and AI-based learning models [7] for forecasting. In recent years reserchers have intensively explored ML/AI-based algorithms like random forest (RF) [8], support vector regression (SVR) [9], boosting algorithms [10], fuzzy logic [11], and artificial neural networks (ANN) [12]. An introduction of adaptive pattern recognition and self-organizing techniques for STLF gained significant attention [13] and later developed into an adaptive neural network for short-term forecasting [14]. Moon et al. [15] categorized the load data using a decision tree and later fabricated a hybrid model using the RF and multilayer perceptron. Among all the forecasting methods, ANN received noteworthy acknowledgment as it brought forth advanced prediction systems through feeding patterns of information via the input unit.
Recurrent neural network (RNN), widely implemented for forecasting purposes, struggles to solve time-series issues due to vanishing and exploding gradient problems. Researchers started exploring the long short-term memory (LSTM) network to overcome this challenge and successfully created several load forecasting models applying the LSTM network. Marino et al. [16] employed an advanced load forecasting methodology using two different LSTM architectures producing promising results. Bouktif et al. [17] proposed an LSTM model using the feature selection and genetic algorithm. However, these deep learning methods can be improved further by extending the number of works with various forecasting setups [18,19].
As of recent, both RF and LSTM have exhibited tremendous potential in the forecasting of time series data. However, RF faces a significant drawback during its execution when the increase in the sample does not improve its prediction accuracy. Also, the LSTM tends to overfit as well as execution time slows down when the number of parameters increases. This paper presents a hybrid model that takes advantage of RFs ability to segregate vital features for a forecast by isolating variables of the least significance. By omitting the trivial features lacking in assisting forecast, the hybrid model reduces the parameter quantity and enhances the bi-LSTM model to be simpler and faster to fit and predict.

RESEARCH METHOD
This section explains the details of the proposed model that includes data analysis and the proposed hybrid model simulation. Figure 1 presents the overall architecture of the proposed hybrid model. In the first layer of the model, the RF is used to identify the prediction strength of the features in the electricity consumption dataset. To locate the dominant variables, we measure the total decline in node impurity using the mean decrease impurity (MDI) method. The MDI counts the events when the model uses a feature to branch out a node, weighted by the number of samples it splits across all decision trees of the ensemble. In the second layer, we utilized the bi-LSTM architecture. A memory block in the LSTM theoretically consists of four parts an input gate , a forget gate , an output gate , and self-connected memory cells .

Casual structure of proposed RF-bi-LSTM hybrid model
In (1)(2)(3)(4)(5), the variables represent the following meanings: = Forget gate, = Input gate, = Output gate, ′ = Intermediate cell state, and = Cell state. These are the equations of gates and cell states of the LSTM.
The LSTM has its unique trait of preserving information that previously passed through it by utilizing its hidden units [20,21]. In (6), ℎ represents the hidden state of the LSTM architecture. The Bi-LSTM model consists of two discrete LSTM networks where one access information in the forward direction and another access in the reverse direction [22]. This process helps to store knowledge from inputs of future and past at any point in time using two combined hidden states, and generate accurate output from the context of both the past and future data. Lastly, to generate forecast results, we feed the dominant features identified in the previous layer as the input to the bi-LSTM layer.

Dataset generation process
To evaluate the performance of the proposed model, we have used a dataset that is related to Bangladeshi electricity consumption data available at https://www.bpdb.gov.bd. For preparing the dataset, we utilized algorithm 1 that converts vital information of historical electrical load data from the PDF files to CSV format. Input: Start and end date of the PDF reports 2

Algorithm 1: Data Extraction and Dataset Generation
Output: A CSV file with extracted data from all the PDF files between start and end date 3 for each date between the start date and end date do 4 download the PDF of this date and store into a specific folder for each PDF report from the downloaded reports do 7 extracted_row ← set of extratcted requisite data using regular expressions append the extratcted_row into extracted_data 8 9 save extratcted_data as a CSV file The automated tool to generate historical electricity consumption data consists of two subunits; the first part is a data extractor program that uses regular expressions (REs) to parse the PDF formatted files to extract specific information. The second part is called a CSV loader program that transfers the extracted data appended in a list into a CSV file. The data extracted for the period of June 1, 2015, to June 30, 2018, having a total of 1126 rows holds essential information from daily electricity consumption reports of mentioned 36 months.

Preliminary data analysis
To understand the different characteristics of the time-series, we analyzed the data to find the effects of the properties like seasonality, auto-correlation, and stationarity on the dataset. Figure 2 shows the decomposition factors of energy consumption data which reflects irregular trends and seasonality in our multivariate time-series data. We discerned that the electricity consumption has steadily increased over the years due to the stable economic growth as well as the rapid increase in the population [23]. The seasons in Bangladesh mainly fall into three categories; summer, rainy, and winter seasons. The power consumption rate naturally tends to be lower during the winter season that signifies the first and fourth quarters of the year. During the summer and the rainy seasons in the second and third quarters, the electricity consumption steeply progresses over the years. We observe that the rate drastically declines again when advancing through the winter season. These observations indicate the presence of repetitive short term seasonal cycles throughout our dataset.
Forecasts using the non-stationary time-series data may generate spurious conclusions by surmising false connections among variables. When a time series data following the time argument has invariant finitedimensional distributions, it is called stationary data [24]. We employed tests like summary statistics and augmented dickey-fuller to detect stationarity in time-series data. In the summary statistics test, we split our data into two partitions and determined the mean and variance for both segments to compare. As depicted in Table 1, the mean and variance generated are not uniform over time, presenting our dataset to be non-stationary. Applying the Augmented Dickey-Fuller algorithm, we assessed the result using the p-value from the test. From Table 1, it is apparent that the series is non-stationary as p-value and ADF statistic both are greater than the threshold 5% significance level, and therefore we fail to reject the null hypothesis that unit root does exist. From the preliminary data analysis, we can conclude that the presence of a changing trend, seasonality, and non-stationarity of our data presents complex traits that can create unstable and deceptive results and direct to unreliable forecast results.

Proposed hybrid model simulation
The proposed RF-bi-LSTM model consists of two layers. To simulate the first layer of the hybrid model, we constructed an RF forecasting model that finds the variables with the highest importance for predicting forecasts. The initial input to the RF model is a complete set of features comprising time and electricity generation related factors mentioned in Table 2, and Table 3 describes the quantitative analysis of the power consumption dataset. We had to convert our input into a stationary dataset using the differentiation  Table 4 shows the parameter values used to train deep learning models. Through meticulous testing, we empirically decided the hyperparameter values to secure the most stable performance of the models. At first, we fit the model with default parameters to get a baseline idea of the performance, and later we confirmed the best values for the hyperparameters by increasing or decreasing them. An integer value between 0 and 6 2 day_of_month An integer value between 1 and 31 3 month An integer value between 1 and 12 4 season An integer value between 0 and 2 5 is_weekend An integer value between 0 and 1 6 max_demand_gen Maximum demand of power at generation end (in MW) 7 highest_gen The highest amount of power generated at a specific hour in a day (in MW) 8 min_gen The lowest amount of power generated at a specific hour in a day (in MW) 9 day_peak_gen Total power generated at peak hour during daytime (in MW) 10 eve_peak_gen Total power generated at peak hour during night time (in MW) 11 eve_peak_load_shedding The amount of load-shedding at peak hour during night time (in MW) 12 max_temp Maximum temperature of the particular day ( • C) 13 total_gas The amount of gas used to produce electricity in a day (in MMCFD) 14 total_energy The total amount of energy consumed in a day (generated locally + imported from India) in a particular day (in MKWh) To evaluate the features with high importance, we estimated the decrease in variance impurity and considered the mean decline in precision over all trees of the ensemble. An outcome class additionally separates this significance measure, and the retrieved important features from the model are plotted in a sorted manner according to their values. This significance is a proportion which highlights by what amount expelling a variable decreases impurity, and vice versa. If a variable has next to no perceptive power, rearranging may prompt a slight increment inexactness because of random noise. In turn, this can offer ascent to negative significance scores, which can be regarded as equivalent to zero importance.
We decided on the conditions based on impurity, which in the case of regression trees is variance. If we examine the decision tree depicted in Figure 3 to understand the internal mechanism of the RF model, we observe that only a few variables assist in the forecast prediction. The tree in Figure 3 shows the underlying mechanism of the random forest to reveal how each feature adds to the end value.
We learn that the value of the prediction changes along the prediction path within the decision tree, together with the information which features caused the split. The presented tree also shows that a leaf node with higher value shifts to the right branch of the decision tree. As a result, we discover that several features such as maximum temperature, the total amount of gas are useless for forecasting.
In the second layer of the hybrid model, we simulated the bidirectional LSTM model by taking only the identified essential features from the first layer as input. These five features are the minimum generation (generation end), day-peak generation (generation end), highest generation (generation end), max. demand (generation end), and evening peak generation (generation end). We predict the total_energy (Generation + India Import) variable among the fourteen properties in Table 2 for power demand forecasting. Total_energy corresponds to the inclusive power consumption on a particular day.
Before training the model, we scaled the data using the RobustScaler to make it robust to outliers using the interquartile range. Table 4 shows the parameters used to initiate the LSTM model for forecasting. We concluded the hyperparameter values based on the trial and test approach to identify the parameter values that best suit our model. During the training, the data passes through the LSTM, Dropout, and dense layers [25], and finally fits to forecast electricity load based on unseen data. LSTM, Bi-LSTM, RF-bi-LSTM Activation = relu, weight optimization = adam, batch size = 512, number of epochs = 700, validation split = 0.1, learning rate = 0.01 Figure 3. A random tree upto three levels

RESULTS AND DISCUSSIONS
This section discusses the experimental results with a detailed performance analysis of the proposed RF-bi-LSTM hybrid model.

Performance comparison and result analysis
To observe the performance of the proposed model, we conducted a comparative analysis of our hybrid model with the standard LSTM and bi-LSTM models. We adapted cross-validation (CV) for optimum performance evaluation by splitting our data into test and train sets. In the experimental evaluation, we apply a total of 13 features to train the LSTM and the bi-LSTM models. Later we train the proposed RF-bi-LSTM hybrid model utilizing only the five most dominant features having positive values in terms of prediction strength. In our proposed model, we use the window of 10 days as an input to the bi-LSTM layer to predict the electricity consumption of the next day as a result. As for performance metrics, we utilize mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).
Applying the performance metrics, we present the summary of the experimental results in Table 5, which concludes that the proposed RF-bi-LSTM model returns a more favorable output in terms of forecasting electrical load than the standard LSTM, and bi-LSTM models trained with all 13 features. By observing the values of the error metrics in Table 5, we can establish that the proposed method has a competitive advantage for power consumption prediction over other deep learning methods. To illustrate the comparison better, we visualized our prediction data. In Figure 4, we discern a distinct difference between the predicted forecast results of the standard bi-LSTM and the proposed hybrid model. From Figure 4 (b), it is apparent that in maximum data points, our recommended model exhibits a lesser error. We can also conclude that we can predict the peak points in the energy consumption data more precisely using the hybrid model than the bi-LSTM model, see Figure 4. Attributing to the values in Table 5, we can verify that our proposed hybrid model manifests better performance resulting in fewer error metric values than the LSTM and the bi-LSTM model. In the time-series decomposition graph in Figure 2, we can see that our data has features like changing trend, seasonality, and non-stationarity which make it quite challenging to forecast electricity consumption given it is a multivariate time-series data. Figure 4 (b) reflects that our proposed model has efficiently modelled these complex traits and performs superior to other deep learning models.

Learning loss analysis of proposed RF-bi-LSTM model
When we trained the LSTM and the bi-LSTM models with all features from the power consumption dataset, we detected that both of the models present an overfitting tendency as they grow more specialized with the training data. As a consequence of overfitting, the validation loss begins to rise after a specific point while the training loss sustains falling as the models gain experience. Furthermore, the models fail to accurately predict new observations that are not part of the original training data set, and generalization error rises.
On the contrary, our proposed model trained with significant features balances between an overfit and underfit, resulting in a good fit. In Figure 5, we can see the learning loss graph of our proposed RF-bi-LSTM model. From the loss graph, we can distinguish that our model delivers a good fit as the training and validation loss drop to the point of stability and has a modest generalization gap in between both curves. It exhibits stable learning characteristics and signifies our model's ability to produce a better forecast result on unseen data. From the observation of learning loss between distinct models, it is apparent that our proposed model yields to be more efficient than the other state-of-art models.

CONCLUSION
This paper presents a hybrid model for short-term load forecasting using the random forest and the bidirectional long short-term memory. To validate the model, we use a three years long electric power consumption dataset. For preparing the dataset, we collected daily PDF reports from the official website of BPDB and then converted it into the CSV format. Among the 13 features of the dataset, we have found the five most prominent features. In the experimental evaluation, using the five dominant features, the performance of the proposed model is compared with the state-of-art models by considering the different statistical parameters-MSE, RMSE, MAE, and MAPE. Experimental results demonstrate that the proposed hybrid model outperforms the compared models by exhibiting fewer error metrics.