The quality of data and the accuracy of energy generation forecast by artificial neural networks

Received Oct 3, 2019 Revised Mar 6, 2020 Accepted Mar 15, 2020 The paper presents the issues related to predicting the amount of energy generation, in a particular wind power plant comprising five generators located in south-eastern Poland. Thelocation of wind power plant, the distribution and type of applied generators, and topographical conditions were given and the correlation between selected weather parameters and the volume of energy generation was discussed. The primary objective of the paper was to select learning data and perform forecasts using artificial neural networks. For comparison, conservative forecasts were also presented. Forecasts results obtained shaw that Artificial Neural Networks are more universal than conservative method. However their forecast accuracy of forecasts strongly depends on the selection of explanatory data.


INTRODUCTION
Electrical energy is one of the most commonly used sources of usable energy. Its generation was mainly based on natural resources, such as carbon, oil, gas, or radioactive elements. Diminishing resources of oil or gas, arduous effects for the natural environment caused by fossil-fuel power stations and waste from atomic power stations led to a greater interest in renewable energy sources. Recent years have seen very dynamic growth of both wind power station and solar power station. The generation of energy in both cases is highly changeable [1][2][3], as it directly depends on atmospheric conditions. The stability of energy system requires the balance between supply and demand of electrical energy, which in turn involves estimating the amount of energy that is to be produced by renewables at a given moment, therefore, the energy forecast from both sources is necessary. High accuracy of wind energy forecasts increases economic benefits by reducing energy generation costs and improves the security of energy system.
The issue of predicting the energy generation by wind turbines is broadly discussed in the literature. The authors apply various forecasting methods, starting from the conservative method [4,5], through statistical methods [5,6], econometric methods [4], physical models, pseudo-intelligent methods, (artificial neural networks, fuzzy logic) [4,[7][8][9], and ending on hybrid methods [10][11][12][13]. The Presented forecasts concern the energy produced by individual turbines [4,5,10] as well as by whole wind burdened with smaller errormwhen compared to the ong-term forecasts which results in higher interest of them. The current paper, for forecasting purposes uses feedforward multilayer networks comparing its accuracy with the conservative (naïve) model. The presented in the paper have a short term.

FORECASTING METHODS
The paper presents short-term forecasting results conducted with the conservative method and using artificial neural networks. In order to forecast with this method, it is sufficient to know the historical amount of energy generation. The nature of neural networks allows to select input data (explanations) for predictions from any area. However, to obtain a satisfactory effect, the forecast amount (energy generation) must depend on the explanatory values, thus the selection of proper data is crucial.

Conservative (naive) method
The conservative model is the simplest way of predicting wind generation. It demands to generate the power of wind generation in the moment i (of forecast generation) [16]: Such correlation, despite its simplicity, works very well when preparing forecasts prepared in ultra-short (hourly) term. It is caused by a relatively conservative wind. In addition, the wind farm which comprises a substantial number of wind stations occupies a substantial area, which is the reason why individual units do not react at the same time to the change of wind speed. The forecasts are usually conducted for = 1 and the method has the form of naïve method.

Neural networks
Artificial Neural Networks (ANN), both in the structure and operation, follow the nervous system of living creatures. In order to use them for forecasts, it is necessary to conduct the training process of artificial neural networks, that can be controlled or uncontrolled. The forecasts in the paper used feedforward neural networks trained with the controlled methods. Then, the training set can be described with dependence: where: U : training set, N : number of objects in training set, : i-th object of the training set (description of the object), : features' vector of i-th object (input vector for ANN), : vector of ANN's answers, : function of qualifications.
In case of controlled training methods, the purpose of a learner is to find weights' matrix that allow the conversion of input signals which describe the object ( ), for expected answers ( ). Thus, according to the formula (2) the training set U comprises N of representing pairs: network input (describing the object's features) and corresponding answer. For not trivial cases it is usually impossible to find separating function which classifies objects in a proper way. That is why, the training process allows some error tolerance, so that the learning process brings best results [17,18].
The process of training artificial neural network is stochastic so, to some extent, it is unpredictable. This weakness is at the same time SSN's greatest strength-the feature saying that the network can solve problems that people are unable to solve or we cannot describe using classic mathematical apparatus. The whole knowledge of neural network is stored in the weights of particular neurons, and the training process itself is about the modification of the weights values according to a set algorithm [19,20].
Taking into consideration the kind of algorithm (of sets), according to which synaptic weights modification is performed in the training process, a number of training rules can be distinguished. They treat about the dependences according to which the neuron's weight values are modified. Some of the rules are related to the training mode with teacher, and others to training without teacher [18,21]. Neuron output value can be written with dependence: where: = [−1, 1 , 2 … ] = [ 0 , 1 , 2 … ] -vector of input signals; = [ , 1 , 2 … ] = [ 0 , 1 , 2 … ] -vector of synaptic weights; U -neuron diaphragm potential.
The formula analysis (3) leads to a conclusion that an artificial neuron realizes the function of two vector variables which convert the signal from n-dimensional space into one-dimensional space [22]. The transition function determines neuron's "behaviour". Most commonly used functions are: hyperbolic tangent, hyperbolic sine, and linear function. Other types of functions are more rarely used.

IMPACT OF WEATHER CONDITIONS ON WIND FARM OPERATION
Power generated by a wind farm is a total power of its generators.
Power of an individual turbine ( ) is directly dependant on the wind strength: The formula for wind farm power is as follows: Most often, a single farm has the same generators. Then, = . It can be most often assumed with a small error that air temperature and pressure within the farm are the same: = , = . Some variables may occur in case if wind speed. The factors mentioned above are the atmospheric and measurable factors. It is significantly more difficult to determine a momentary turbine's efficiency, and its changes directly influence the generator's characteristics, which is the reason why its real process is different from the ideal one Figure 4. The influence on efficiency can have multiple factors such as wind gradient, temperature gradient, wind speed profile, disturbance caused by neighbouring turbines, changes in wind direction, energy consumption on its own needs.

EXPLANATORY DATA SELECTION FOR ANN
The strength of neural networks is about an automatic search for dependences between input and output data, as a result of a process called training. For effective training it is necessary to select proper input data that is the ones which have an impact on output data, in our case, on the forecast energy generation. The explanatory (determinant) feature of input data as well as its representativeness are equally important. Owing to their ability to learn automatically, artificial neural networks can find information hidden in data that is hardly accessible for other methods. As it was mentioned above, proper selection of input data is the only condition.
The paper [7] presents the physical influence analysis of the wind speed and direction, air temperature and atmospheric pressure on the power generated by turbines. Based on these analyses, four forecast models of neural networks were created and tested. The conducted analyses showed that the model which considers wind speed and temperature was most accurate. The paper [4] selects data based on the correlation value between particular factors and energy generation per day, and on the mutual correlation between data. On the basis of the conducted analyses, the authors selected the following explanatory data: wind speed forecast, atmospheric pressure forecast, wind speed on a previous day, energy generation on a previous day, average month energy generation value. The analysis of correlation between explanatory data and energy generation is also used in papers which apply to forecasting by artificial neural networks [9] and other forecasting methods than artificial neural networks [5].

Weathers factors
In the present paper, the selection of explanatory variables begins from the analysis of correlation between selected atmospheric factors and the energy generation volume.The studies covered the whole wind power plant. The analyses involved the following external factors: The values of a correlation coefficient for the analysed factors between 2014 and 2017 were shown in Table 1. The data analysis shows the strongest dependency of energy generation volume on the wind speed [23]. The correlation with the other variables is significantly lower. Moreover, its value in particular years greatly varies, therefore, it is necessary to analyse the reason of the mentioned variables before a final decision about the selection of explanatory data is made.   Figure 6, for the wind speed of 0m/s, 2.4 m/s and 4.6 m/s are hard to justify other way than as incorrect measurement of wind speed or energy generation volume. Similar situation can be noticed in diagrams with the correlation of wind speed and energy generation volume for the years 2016 and 2017, but they appear for different speed in both cases. Thus, the data was cleared out of values for which energy generation volumes were significantly too high for particular wind strength. After elimination of undoubtedly incorrect measurements, wind speed correlation coefficients at different heights and its direction with energy generation volume changed Table 2.
The analysis of coefficients of the correlation between weather parameters with energy generation volume indicates high interdependence between energy generation volume and wind speed. The correlation of the other parameters with energy generation volume is low. It was therefore decided that neural network training would be conducted using two different explanatory variables: -Wind speed at a height of 100m; -Wind speed at a height of 100m and wind speed at a height of 50m.

RESULTS AND ANALYSIS
For the purpose of evaluation and comparison of the quality of conducted forecasts, the following indices were introduced: -Mean absolute error:

Forecasts with conservative method
In the conservative method an approach with the perspective was applied: -1h -forecast based on the generation an hour before, -6h -forecast based on the generation 6 hours before.
Figure7 showed forecast results for the perspective of 1h. Very short time perspective makes a visual impression that the waveforms practically coincide with each other. However, the values of indices presented in Table 3, which estimate forecast accuracy show that the error is not so small. Attention is brought to a low value of mean absolute error (MAE) and a relatively high value of maximum absolute error (MaxE). Figure 8 shows the forecast results for the perspective of 6h and Table 4 includes the values of indices for estimating forecasts accuracy. Time perspective extension significantly worsened forecasts accuracy-mean error increased more than 100%, while maximum absolute error reached the value close to the rated power of electric power plant Table 4.

Forecasts with neural network
Artificial neural network was conducted using the data presented in p.5. The trained network was used for forecasting based on current weather, volume of energy produced by energy wind generators. Data The kind of problem and the structure of data indicate that studies should begin the selection of SNN architecture from multilayer feedforward networks. The literature, in case of feedforward networks, indicates that a network comprising 3 layers: input, hidden and output, is able to solve every problem. A remaining problem to solve is the number of neurons in particular levels. In case of input and output layers the answers are obvious: -The amount of neurons in the input layer must be equal to the length of input vector-in our case, it will be 1 neuron, when the training is performed based only on the speed at a height of 100m or 2 neurons, when the training uses the speed at the heights of 100m and 50m. -The amount of neurons in the output layer must be equal to the amount of forecast values-in our case, it the amount of generated energy, so 1 neuron is sufficient.
Determining the length of hidden layer, in turn, is not so obvious. Although there are some dependencies used for this purpose but the literature indicates that calculated values should be treated as minimum values, and the selection should be performed by trial and error. In the discussed case, the tests were conducted for the hidden layer comprising 2, 3, 4 and 5 neurons. It should be emphasised that an excessive increase in length of hidden layer may lead to an over training which results in very good training results and a significant drop in performance of network with the data analysis outside the training set.
According to the assumptions of feedforward network architecture, all neurons in a layer have the same transition function. It was decided that in the input and output layers a linear function would be used, and in the hidden layer, tangent curve function. The trained network with Levenberga-Marqurdta method. Network training and tests were carried out in the Matlab env., equipped with the toolbox Neural Networks that offers a wide range of architectures and procedures used for training the ANN [24,25].
Visually, similar results were obtained for the annual forecast, in both analysed cases-forecasts based on wind speed at a height of 100m and forecasts based on wind speed at of 100m and 50m as shown in Figures 9 and 10. Tested accuracy coefficients have similar values as shown in Tables 5 and 6. Mean absolute error for the first case is 0.72MW, while for the second one 0.76MW. Allowing forecast deviation from real values of 20%, the network effectiveness is 93.7% for the first case and 92.2% for the second case. With forecast deviation of 10%, it is 72.7% and 72%, respectively. Obtained values are comparable, but a bit better for the forecasts that use only wind speed at a height of 100m.  In case of monthly forecasts, due to cyclical character of atmospheric conditions, data can be more coherent which should result in greater accuracy of prediction. The obtained forecasts results prove these assumptions as shown in Figure 11. All accuracy prediction indices improved Table 7. The important thing is that maximum forecast error decreased to the value of about 3.85MW which is the value below 40% of rated power of electric power plant. The results from February 2015 went particularly well-mean absolute error was 0.34MW, root mean square error was 5.4% and 92.8% of forecasts did not differ of more than 10% from real values.

CONCLUSION
The paper discussed 4 models: conservative with 1h perspective, conservative with 6h perspective, neural network that used wind speed at a height of 100m, neural network that used wind speed at a height of 100m and 50m. The highest accuracy was obtained for neural network that used wind speed at a height of 100m with monthly forecasts. The studies proved known from literature phenomenon that artificial neural networks can be effectively used in tasks of forecasting energy generation.The comparison of forecasts results obtained using SSN with conservative method indicates that networks are more universal. Forecast accuracy of conservative method decreases along with the extension of time perspective.
Forecast accuracy of forecasts with artificial neural networks strongly depends on the selection of explanatory data. In the analysed case, the influence of 9 different data were tested-it turned out that only wind speed has a real impact on energy generation volume. Of course, it does not mean that other than analysed data fail to influence the forecast value. At the same time, it cannot be eliminated that by different input data coding, their correlation with energy production will improve.
The quality of explanatory data (measurements) is very crucial for the forecast accuracy. In case of artificial neural networks, mendacious data aggravate the training and, regardless of prediction method, disenable correct forecasts evaluation. In case of ANN, reducing the range of data to one month had a positive impact on the forecast accuracy. Training performed for data from the whole year and, next, forecasting the generation for the whole year gives worse results than training with data from one month and forecasting for the same month in following years. The changes in the turbine operation resulting from weather and operating conditions inhibit the neural network training process, as it seems that they could be partly identified by introducing additional explanatory variables.