Short-term load forecasting with using multiple linear regression

Received Jun 25, 2019 Revised Feb 25, 2020 Accepted Mar 3, 2020 In this paper short term load forecasting (STLF) is done with using multiple linear regression (MLR). A day ahead load forecasting is obtained in this paper. Regression coefficients were found out with the help of method of least square estimation. Load in electrical power system is dependent on temperature, due point and seasons and also load has correlation to the previous load consumption (Historical data). So the input variables are temperature, due point, load of prior day, hours, and load of prior week. To validate the model or check the accuracy of the model mean absolute percentage error is used and R squared is checked which is shown in result section. Using day ahead forecasted data weekly forecast is also obtained.


INTRODUCTION
In today's world continuity in electrical supply is necessary. Electricity is an essential need in daily life of people. Electrical utilities try to provide continuous power supply to their customers. To ensure this load forecasting is an important aspect. Load forecasting is about estimating future consumptions based on various data and information available and as per consumer behavior. Load forecasting has an effective role in economic operation of power utilities. Short-term load forecasting refers to forecasting load for hour, day or week ahead. There are many tools and techniques available for load forecasting such as simple time series, regression, neural networks etc. Load forecasting mean forecasting average load in KW or total load in KWh for periods or blocks of 15 minutes, 30 minutes, 1 hour, day, week, month or a year for daily forecast, weekly forecast, monthly forecast or yearly.
There are many factors [1] which influence the accuracy of load forecasting like weather variables, holidays, festivals or events, tariff structures, available historical data, time of the year, day of the week and hour of the day. Weather variable includes temperature, humidity, rain and wind. Temperature and humidity has a considerable effect on power consumption because as the temperature rises people will turn on air conditioners and if temperature is low then air heaters will get turned on, which causes increase in electricity demand. If there is festival, the electricity demand will rise due to lightings.
Data used for STLF is mostly historical hourly load and weather parameters. Hourly load data is subject to errors made by instruments and/or operators and also unwanted or unplanned changes in the power system. Paper has considered five different cases of days and Absolute normalized residual method is used for improper data marking. Absolute normalized residual method is used for identification and modification of improper data [2]. Good quality of data is essential in power system before taking decision. So data must be cleaned and filtered before operator takes any decision from the data. Otherwise it will cause hazardous condition. Poor quality of data affects decision making without knowledge of operator. Data smoothing is necessary for some application for e.g. load forecasting in power system. Techniques used for this are Statistical techniques such as OWA (optimally weighted average) and MA (moving average). Data smoothing and filtering is done on smart meter data with measurements of 15 minute interval of DONWOD region of New York [3].
In Paper [4] Short term load forecast will not assume the contribution of agricultural load which is season dependent and thus is assumed to be constant in the span of few days. The triangular membership function is used where the support of the membership function is decided on the basis of the collected data. Relationship of load with weather variables, temperature and humidity is shown. These parameters are modeled in the domain of fuzzy, to obtain realistic value of the forecasted load when compared with the actual load. In paper [5] ANN (artficial neural network) technique is used for STLF. Three models are considered, 1. Prediction of the next hour load, 2. Prediction of the next day load profile, and 3. Prediction of the next day peak load. ANN models uses load profile and weather situation as input layer and results in forecasted next hour load, next day load profile, and next day peak load. Mean absolute error is used to evaluate the performance of Forecasting.
In paper [6] multiple linear regression analysis technique is used to forecast damping in Nordiac power system. A static MLR model is developed to explain the variability of the damping of the 0.35-Hz inter-area mode in the Nordic system. In [7] five short term load forecasting techniques are discussed. These five techniques are (i) Multiple linear regression, (ii) Stochastic time series, (iii) General exponential smoothing, (iv) State space and Kalman filter, and (v) Knowledge-based approach. Algorithms and necessary equations for these techniques are discussed. For other parameters, methods of short term load forecasting, controlling factors etc, discussed in [8][9][10][11][12][13][14][15][16][17][18]. In comparison to [19][20][21][22][23][24][25][26][27][28][29], proposed method calculates the short term load forecast using multiple regression method by calculating the random error (5%) which is showing the difference between observed and fitted model. The trained model is 95% accurate. Also the effect of variables is included.

RESEARCH METHOD
Method used for Short-term Load Forecasting is multiple linear regression. Regression analysis helps to know the relationship between load and variables affecting load consumptions like weather variable, seasonal effects and previous load consumption data. This method is based on statistical approach. Regression analysis helps to find out unknown values of parameters on the basis of given data set (available historical data). The calculation of unknown values of parameters can be explained by taking simple regression model. Simple linear regression model: Here, i = 1, 2, n y is electrical load 0 an d 1 are Parameters (Regression coefficients) x is explanatory variable is random error In this model relationship between load y and variable x is expressed with the help of parameters. We can obtain the value of load y for known value of variable x, if we have the β values of parameters. So, if we know the β values of parameters, we can obtain the values of load at any given values of x.
is random error which shows how the fitted model is different from the actual model. It reflects the difference between observed and fitted model. Here this model is for only one explanatory variable, but if we have more than one explanatory variables, multiple linear regression model is used.In multiple linear regression more than one factors are considered for affecting the response (Load). [ Above representation is based on matrix dimensions as shown below, and solvable in (3) ISSN: 2088-8708  Short-term load forecasting with using multiple linear regression (Bhatti Dhaval) 3913 Y: nX1 vector of observations on study variable or response variable X: nXk matrix of n observations each of the k independent variables β = (β 1 , β 2 … , β k ) T : kX1 vector of regression coefficients associated with x 1 , x 2 , … x n ε = (ε 1 , ε 2 , … ε n ) T : nX1 vector of random errors Interceptor term takes first column of X to be (1,1, … 1) -Rank(x)=K:full column matrix -ε~N(0, σ 2 , I n ) If we have the values of vectors for regression coefficients we can predict the load for given observations. To find the values of regression coefficients method of least squares is used. It is based on the principle of maxima/minima. Here minimization of sum of squared errors is considered.
So from (3) we can write, S(β) is a real valued, convex, differentiable function where the minimum will always exist. According to principle of minima, So values of parameters can be found out by this equation. Variance of residual also needs to be considered so, Var(ε i ) = σ 2 σ 2 = SS res n − k = MS res SS res is sum of squared errors due to residuals MS res is mean squared error Once the values of parameters are known fitted model can be obtained. Fitted model: H = X(X T X) −1 X T Residuals ( ) can be obtained from the difference between observed and fitted values.
Total deviation and Sum of squared residual is given as, y i − y ̅ = (y i − y î ) + (y î − y ̅) y ̅ is average value and ̂ is estimated value.

RESULTS AND ANALYSIS
Historical data of Newyork city of 9 years starting from May 2007 to April 2016 is used to train the model. This data had measurement interval of one hour so there are more than seventy eight thousand observations. For taking account of the effect of weather variables, dry bulb temperature and due point is considered. The purpose of taking lots of information is due to the fact that, if more data is used to build the model, good accuracy in prediction of the model is obtained. This data contains bad data like missing data, measurement errors and sudden spikes, but load forecast needs fine granularity of data. So the data needs to be cleaned or filtered before it is used to make predictions. This is obtained in our previous work [3]. So the filtered data is used here for load forecasting.
Firstly to forecast the load, regression model needs to be built based on available information which is previous year's data of load consumption and temperature data. Software here used for training the model and forecast the data is MALAB 2018b. In order to build an accurate model, useful predictors are needed. A common technique with temporal predictors requires breaking them into their separate parts so that they can be varied independently of each other. For that three predictors are created, hour, weekday and is weekend or not. The load data itself can be used as a predictor. Load can be used as predictor or not, can be verified by correlation with load. The peaks in the autocorrelation at the 24 and 168 hour, are lagging, so, lagged predictors of 1 and 7 days can be used. So, two predictors are created one as load of prior day and other as prior week. Response variable is load, as load is to be forecasted and explanatory variables are hour, day of week, temperature, due point, weather it is weekend or not, load of prior day and prior week.
As shown in research method, model is trained using multiple linear regression and least square estimation. Once the model is trained, it is checked how the model will perform on data. Data, for the year 2012 from data set is removed and load forecast for year 2012 is obtained from trained model. To check the effectiveness of forecasting method mean absolute percentage error (MAPE) can be used.
MAPE considering weather data, for training data is 5.11% and for test data or data it had not seen before is 5.15%. MAPE without considering weather data for training data is 5.05% and for test data is 5.06%. So the trained model is nearly 95% accurate in prediction of a day ahead load and method is accurate. The reason behind decrease in MAPE when we do not consider weather data is for day ahead load forecasting because it is not much dependent on weather variable. So when we do not consider temperature and due point as explanatory variable, MAPE decreases. For further validation of model 2 can be checked and it is 0.89 and adjusted 2 is also 0.89, which means model is able to explain nearly 90% of the varation in load. Here T-staistic and P-values are shown in Table 1 If the T-stat is greater than 2(>2) or less than 2(<-2) of each variable, it is acceptable. In table all the values of T-stats are greater than 2 or less than 2, meaning that all the variables are statistically significant. P-values indicate that the population parameter is equal to zero. If it is less than 0.1, it indicates a significant regression. As shown in table most p-values are zero and one is also below 0.1, therefore all the repressors are significant. So the trained model is good and can be used to make prediction. Here a day ahead load forecasting is obtained for 25 th June on hourly basis by using trained model as given in Table 2. In Table 2 day ahead load forecasting is obtained and with the help of day ahead load forecast, week ahead load forecast is also obtained. Week ahead load forecast is obtained by considering day ahead load forecast as actual load data and used for next day load forecast. This process is done for whole week. Here forecast of week ahead is as shown in Figure 1. Figure 1 displays, Week days 25 th June to 1 st July on X-axis and Load in1000 to 8000 MW on Y-axis. Weekly forecast cannot be obtained directly because forecast model is dependent on data of previous day. So first day ahead load forecast is obtained and it is used as data, then process is repeated and forecast for next day is obtained Figure 1.

CONCLUSION
Short Term load forecasting for a day ahead forecast is obtained using multiple linear regression. Use of multiple linear regression is because load is dependent on many variables like historical load and weather variable. Multiple linear regression model for short term load forecasting is easy to develop and MATLAB is used to develop the model. For day ahead load forecast there is no much effect of temperature and due point on load as the mean absolute percentage error with weather variable and without weather variable is nearly same. Model is 95% accurate in prediction. So the day ahead load forecasting is done with the developed model and with the help of day ahead load forecast data, week ahead load forecast is also obtained.