Applying of Double Seasonal ARIMA Model for Electrical Power Demand Forecasting at PT. PLN Gresik Indonesia

Received Jan 11, 2018 Revised Jul 8, 2018 Accepted Jul 29, 2018 The prediction of the use of electric power is very important to maintain a balance between the supply and demand of electric power in the power generation system. Due to a fluctuating of electrical power demand in the electricity load center, an accurate forecasting method is required to maintain the efficiency and reliability of power generation system continuously. Such conditions greatly affect the dynamic stability of power generation systems. The objective of this research is to propose Double Seasonal Autoregressive Integrated Moving Average (DSARIMA) to predict electricity load. Half hourly load data for of three years period at PT. PLN Gresik Indonesia power plant unit are used as case study. The parameters of DSARIMA model are estimated by using least squares method. The result shows that the best model to predict these data is subset DSARIMA with order ([1,2,7,16,18,35,46], 1, [1,3,13,21,27,46])(1,1,1)(0,0,1) with MAPE about 2.06%. Thus, future research could be done by using these predictive results as models of optimal control parameters on the power system side. Keyword:


INTRODUCTION
Prediction of electrical power demand is an important first step in planning of power plant system operation [1]. Planning of power plant system operation is about building a plan for the preparation of power plant system operation for a certain time period. Based on the issues to be addressed, the power plant system operation plan is divided into several types of time period plan, i.e. annual plan, quarterly plan, monthly plan, weekly plan and daily plan. The operation plan of the power plant system also includes managerial, maintenance, and operations in systems and equipment to ensure good economic value of financing, system reliability, and service quality.
Referring to the problems solving in power system operation, power prediction is classified into three categories, i.e. long-term, medium-term and short-term predictions. Long term electrical power predictionsarerequired for peak load capacity planning and system maintenance schedule [2], medium-term predictions are required for planning and operation of the power plant system [3], and short-term predictions are needed for controlling and scheduling the power plant system [4].
One of the concerns in the power plant system operation is the quality of electrical power. The electric power generated shall be always equal to the electric power consumed by the electric power user. If the power that is distributed is greater than required, then there will be wastage of energy. And if the power produced is smaller than required, it will occur over load which will affect the occurrence of power outages. In fact, the use of electrical energy tends to change at any time in accordance with the needs of consumers. Therefore, it is necessary to predict the use of electrical power that is able to maintain a balance between the supply and consumption of electrical power in the power plant system. Fluctuations in loads mean small disturbances in the dynamical stability studies of power generation systems. This problem occurs after the first swing when control equipment such as governor and excitation systems have worked. Dynamic stability studies in plant systems are still being developed [5], [6]. Analysis of small signal stability of power systems with probabilistic uncertainty on renewable energy generation equipment becomes the topic of the current study [7].
The time series prediction model is an accurate choice and growing continuously to this day for power prediction and forecasting [8]- [10]. The study of time series-based forecasting of electrical power consumption evolved into two parts, i.e. prediction models based on statistical mathematical models such as moving average, exponential smoothing, regression, and ARIMA Box-Jenkins; and artificial intelligencebased prediction models such as neural networks, genetic algorithms, simulated annealing, genetic programming, classification, and hybrid. Some time series research based on statistical mathematics as in [8], [11]- [15]. The application of the Box-Jenkins ARIMA model is developed through a seasonal pattern [16]- [22].For models based on artificial intelligence has also become the attention of researchers as in [23]- [27]. The hybrid model has also been developed to obtain the best data in electrical load prediction study as in [28]- [36].
This study proposes a Double Seasonal ARIMA (DSARIMA) method for predicting or forecasting power demand model in PT. PLN Gresik Indonesia based on three years load training and testing data (daily data every half hour). The prediction results are used as reference parameters for optimum control in improving the stability of electrically generated systems in our research. Time series method based on statistical mathematical model is chosen because of its superiority to process data which is not stationary and not linear. Statistical mathematical models are also capable of generating data that is not included in the training process.

ARIMA Model
ARIMA method or commonly referred to as Box-Jenkins method is a model intensively developed by George Box and Gwilyn Jenkins in 1970. This forecasting model still dominates many areas of research to date. Unfortunately, ARIMA model can only be applied for stationary time-series data. If the data is not stationary, then to make the data becomes stationary it is necessary to do the differentiation process [37].The autoregressive (AR) model indicates a connection between a value at the present time ( ) with a value in the previous time ( − ), plus a random value. While the moving average (MA) model shows the dependence of the current time value ( ) with the residual value at the previous time ( − ) with = 1,2, … .
The ARIMA model ( , , ) is a combination of AR( ) and MA( ) models, with d th -order differentiation process when the data pattern is not stationary. General form of the ARIMA model ( , , ) is as follows: where B is the backshift operator and is the random process values, and Generalization of the ARIMA model for data that has a seasonal pattern is expressed by ARIMA( , , )( , , ) and formulated as follows [37]: where s is the seasonal period, and Short-term power consumption data by consumers has a double seasonal pattern, i.e. daily and weekly season. The ARIMA model with a double seasonal pattern is expressed by ARIMA ( , , )( 1 , 1 , 1 ) 1 ( 2 , 2 , 2 ) 2 and has the following general form [38]: where 1 and 2 are different seasonal periods.

ARIMA Box-Jenkins Model Procedure
The prediction procedure of ARIMA Box-Jenkins model through five stages of iteration, as follows: i. Preparation of data, including checking of data stationary. ii. Identification of ARIMA model through autocorrelation function and partial autocorrelation function.
iii. Estimation of ARIMA model parameters: p, d, and q. iv. Determination of ARIMA model equations. v. Prediction.

Least Squares Estimation
One method that can be used to estimate ARIMA model parameters is the least squares method [39]. For AR (1), carried out by including non-zero mean parameter, μ. This parameter is further estimated by least squares. Consider the first-order case where − = ∅( −1 − ) + : The equation is a regression model with −1 as the predictor variable and as the response variable. Least squares estimation then processed by minimizing the sum of squares of differences Since only 1 , 2 , … , are observed, then we can only sum from = 2 to = . Let it This equation is called the conditional sum squared function. According to the basic principle of the least squares method, we can estimate ∅ and by the respective values through parameter values 1 , 2 , … , . Consider the equation or, simplifying and solving for , Now, for large n, Thus, regardless of the value of ∅, equation (7) reduces to Setting this equal to zero and solving for ∅ yields Except for one term missing in the denominator ( − ̅ ) 2 , this is the same as 1 . The lone missing term is negligible for stationary prosses, and thus the least squares and method-of-moments estimators are nearly identical, especially for large samples.
For the general AR(p) process, the methods used to obtain equations (7) and (8) can easily be extended to yield the same result, namely ̂= ̅ . To generalize the estimation ∅'s, we can be considered through a second-order equation. So, we replace by ̅ in the conditional sum-of-squares function, Setting Which we can rewrite as The sum of the lagged products by is very nearly the numerator of 1 . We can divide both sides of the equation by ∑ ( − ̅ ) 2 =3 then, excepts for end effects, which are negligible under the stationary assumption, we obtain With the same approach for differentiation equations ∅ 2 ⁄ = 0, obtained Then the least squares can be done by selecting a value that minimizes.
It is obvious that the least squares problem in equation (17) is nonlinear in the parameter. We will not be able to minimize ( ) by taking a derivative with respect to θ, setting it to zero, and solving. To address these issues, consider evaluating ( ) for a single given value of . Rewrite first-order equation as Using this equation, 1 , 2 , … , can be calculated recursively if we have the initial value 0 . A common approximation is to set 0 = 0.We can obtain For higher-order moving average models, we can compute = ( 1 , 2 , … , ) recursively from With 0 = −1 = ⋯ = − = 0. The sum of squares is minimized jointly in 1 , 2 , … , using a multivariate numerical method. For a general ARMA model: the conditional sum ofsquares can be defined much as in the MA case, we consider = (∅, ) and wish to minimize (∅, ) = ∑ 2 , so To obtain 1 , we now have an additional problem, namely 0 . One approach is to set 0 = 0 or to ̅ if our model contains a nonzero mean. However, a better approach is to begin the recursion at = 2, thus avoiding 0 altogether, and simply minimize, (∅, ) = ∑ 2 =2 . For general ARMA( , ) model, we compute

Measuring Accuracy Level of Prediction Results
Basically, measuring the accuracy of prediction results can be done by various statistical analysis methods; such as the root mean of square error (RMSE), the mean of absolute error (MAE) value and the mean of absolute percentage error (MAPE). In this research, we used MAPE as standard measurement of prediction results accuracy. MAPE is defined as follows [40]: where and ̂ are the actual value and prediction value, while n is the number of prediction values.

Data Set
This study uses electrical power demand in the load center data (taken every half hour) at power plant unit of PT. PLN Gresik Indonesia on January 1, 2009 -December 31, 2011. Where, data from January 1, 2009 -December 24, 2011 is used for forecasting and data from December 25 -31, 2011 is used for testing.

Model Fitting and Identification Parameter
Exploration of electrical power consumption data is done through time series plot on January 1, 2009 -December 24, 2011. The data pattern is very fluctuate as shown in Figure 1. This condition may be influenced by the integrated power distribution system in Java-Madura-Bali interconnection system in Indonesia. From the picture it shown that the data has not been stationary.  Figure 2 shows that the ACF coefficient is significantly different from zero and the PACF coefficient is close to zero after the first lag. Based on these two points, it shows that the average data has not been stationary and there is a seasonal patterned trend. Therefore, it is necessary to process 1 st level differencing ( = 1).

Figure 2. The ACF and PACF of
The plots of ACF and PACF in Figure 3 have through a differentiation process so that the nonseasonal conditions have been stationary in the mean value. Based on the ACF plot it shows that the autocorrelation values of the stationary data go down to zero after the second lag and the third lag, while for the seasonal data it is still not stationary in the mean value. The ACF plot also shows the existence of another seasonal pattern which is a weekly seasonal pattern on lag 336, 672 and so on. The ACF plot of the load data pattern in Figure 5 has been stationary in the mean value after going through a second order differentiation process (with 2 = 1). For non-seasonal patterns based on ACF and PACF plots, the load tends to decrease gradually especially after the first lag so that the non-seasonal model assumption is the ARMA(1,1) model. While the daily seasonal pattern ( 1 = 48) shows that the plot of the ACF data pattern tends to be interrupted after the lag 48 and the PACF pattern indicates the data pattern tends to decrease gradually, then the assumption of the daily seasonal model is MA(1) 48 model.For the weekly seasonal pattern ( 2 = 336) it shows that ACF and PACF plots along lag 336, 672, and so on tend to fall gradually, then the assumption of the weekly seasonal model is MA(1,1) 336 model. Based on the identification of ACF and PACF patterns, the assumption of an appropriate ARIMA model is a double seasonal ARIMA model (1,1,1)(0,1,1) 48 (0,0,1) 336 .However, if white noise is detected in the test data, it is necessary to add or substitute the order of differentiation process. In this study, Statistical Analysis System (SAS) programming tools are used to analyze load data of double seasonal ARIMA models.

DSARIMA Model Parameter Estimation
The AR and MA coefficients in the DSARIMA model are estimated by the least squares method. The initial estimate obtained has been used as the initial value of the iterative estimation method. Through the double seasonal ARIMA model (1,1,1)(0,1,1) 48 (0,0,1) 336 , the initial data of the AR and MA coefficients were obtained as follows: Based on Table 1, to meet the white noise criterion, p-value must be greater than fault tolerance = 5%, with alpha significance level less than 0.0001.In addition, the model has an improvement pattern with 3 MA parameters i.e. MA(1,1), MA(2,1) dan MA (3,1), so that these three parameters should be included in the model estimation. While the residual assumption test that includes the assumption of white noise must meet the criteria of independent and normal distribution (0, 2 ).
with a fault tolerance of 5% then 0 is rejected if p-value < , which means the residual does not meet the white noise assumption. Based on the final result of AR and MA coefficient parameter estimation in Table 2, it can be plotted the residual normal probability to determine whether residual has fulfilled white noise assumption with limit of ± 1.96 √ ⁄ ≈ ±0.009. By iterating the addition of AR and MA parameters, the best iteration value has been obtained which has fulfilled white noise assumption, i.e. double seasonal ARIMA ( [1,2,7,16,18,35,46

Model Testing and Measuring of Forecasting Level Accuracy
For accuracy measurement, testing uses the MAPE procedure. Based on the comparison of prediction load data with actual load data, the average data accuracy of 2,06% is obtained.

Prediction
Based on the final result of parameter estimation in Table 1

CONCLUSION
Statistical analysis based on DSARIMA model is suitable with electrical power characteristics with continuous and fluctuating load patterns. The load changes are always unexpected at any time depending on electrical power demand in the load center. With the statistical analysis model, predictions are able to generate data that is not included in the data training process. Through the best model assumption, the model in this study was able to predict with the average accuracy of MAPE of 2.06%. Further research that can be developed is the pattern of electrical power demand on a large-scale area such as Java-Madura-Bali, Indonesia electricity network interconnection.