Optimal artificial neural network configurations for hourly solar irradiation estimation

ABSTRACT


INTRODUCTION
The rapid development of the global economy has led to an unprecedented increase in fossil fuel consumption. This surge has simultaneously raised the costs and highlighted the finite nature of fossil fuel resources, prompting a search for alternative solutions [1]. Among these alternatives, renewable energy sources have emerged as a promising option. The sun, as Earth's primary source of energy [2], plays a crucial role in various forms of renewable energy, including solar photovoltaic, solar thermal, wind, geothermal, and hydroelectric power. Even hydrocarbon-generated energy can be considered a solar-derived energy, as it is ultimately the product of photosynthesis [3].
Solar irradiation provides an abundant, pollution-free energy source that has the potential to decrease our reliance on fossil fuels [4]. To effectively implement solar energy systems, it is crucial to accurately determine solar irradiation levels. Two primary methods are used for acquiring this data: ground-level measurements taken by meteorological networks [5] and radiometric stations across the globe [6], as well as mathematical models for estimating data when direct measurements are unavailable [7]. These models utilize a variety of environmental and astronomical parameters, including ambient temperature, relative humidity, sunshine duration, solar declination, the length of the day, solar constant, variations in Earth-sun distance, and the average daily extraterrestrial irradiation on a horizontal plane at the atmosphere's boundary [8], [9]. This comprehensive approach ensures reliable data collection for solar energy applications. In general, these models can be classified into three families [10]; semi-empirical models [11], spectral models [12], meteorological models [13]. The semi-empirical models have a local character and permit the direct, diffuse, and global components to be calculated. They use as input the meteorological variables (such as air temperature, sunshine period, and relative humidity) and geographical parameters (latitude, longitude, and altitude) [14]. They are based on regression relationships that can be usefully exploited to interpolate and therefore reconstitute solar irradiation data in locations with no measurements. The limitation of these models is that they are only applicable in clear sky situations. Moreover, meteorological models make it possible to calculate the global irradiation whatever the state of the sky using direct solar data collected in weather stations. They have the advantage of generating solar irradiation data for different inclined surfaces. Spectral models are primarily aimed at calculating the spectral components of solar irradiation on the ground. They are based on the determination of the transmission coefficients after attenuation by the various atmospheric constituents. They give exact results if you know the characteristics of certain atmospheric constituents, such as aerosols and clouds.
In addition, physical models [10], which consist of processing satellite images are images collected the space stations that can be used to estimate the solar irradiation data. It allows the calculation of the amount of solar irradiation at any point in the world for different sky cases with high accuracy by analyzing satellite images. However, these models depend strongly on heavy mathematical modeling that needs a prior understanding of the dynamic behavior and the used parameters for each model. Hence, artificial intelligence models are proposed to overcome this problem. Kosovic et al. [15] speaks briefly of the performance of it is exhibited when applied to environmental data with the purpose of calculating solar irradiation. Recently, these models play a great role in estimating solar irradiation, ranging from neural network models as shown in Al-Ghussain et al. [16], where the results indicate that the developed models had better regression coefficients than fuzzy logic or metaheuristic optimization algorithms. They have proved to be a powerful tool to provide solar irradiation data [17].
In this paper, we aim to evaluate various artificial neural network (ANN) models for estimating hourly solar irradiation data. While ANNs are widely employed for tasks such as identification, classification, function approximation, and automatic control, their increasing use in data analysis has been observed, particularly as an effective alternative to conventional methods in numerous scientific areas, with a focus on meteorology and solar energy. These models require only a limited set of measured data, including temperature, solar altitude, wind speed, and other parameters.
The primary contributions of this study are twofold: first, we establish a benchmark of diverse neural network models suitable for estimation purposes, and second, we examine the impact of measured radiometric parameters during the estimation process to determine the influence of each parameter on the overall estimation model. To achieve this, we have experimented with various neural network configurations and architectures by adjusting the number of neurons and layers. Additionally, we employ error metrics to assess the accuracy and robustness of the optimal configurations for estimating solar irradiation data.

METHOD
This paper's primary goal is to explore various neural network types and configurations to estimate solar irradiation levels using diverse meteorological input data. The methodology employed in our study is depicted in Figure 1. A comprehensive explanation of each component within the proposed methodology is provided in the subsequent sections. By examining different neural network structures and configurations, we aim to determine the most effective approach to estimate solar irradiation based on the available meteorological inputs. This investigation will contribute to the ongoing efforts to harness solar energy more efficiently and effectively, supporting the transition to renewable energy sources.

Feedforward neural network
Feedforward neural networks, biologically-inspired classifiers, contain numerous neuron-like processing units arranged in layers. Each unit in a layer connects to those in the previous one, with varying weighted connections encoding the network's knowledge. These units are also known as nodes. Data flows from inputs through layers to outputs without feedback during standard operation, enabling the network to function as a classifier, hence the name "feedforward." Figure 2 illustrates a two-layer network with an output layer containing one unit and a hidden layer with two units. The network also features four input modules [18].

Cascade forward neural network
Cascading neural networks resemble feedback networks, with connections from input and prior layers to subsequent ones. Figure 2 shows a three-layer network where the output layer is directly connected to the input layer and hidden layer. Like feedback networks, cascading networks with two or more layers can learn arbitrary finite input-output relationships with sufficient hidden neurons. Applicable to any input-output mapping, these networks preserve linear relationships while also considering non-linear connections between inputs and outputs, offering a key advantage [19].

Fitting neural network
Fitting is the method of creating a curve or a mathematical function that is ideally fitted to a collection of previously collected points. Curve fittings may apply to both interpolations, where exact data points are needed to be smooth, where a flat function is designed to approximate the data. The estimated curves obtained from the data fitting can be used to help show the data, to predict the value of a function where no data are available, and to summarize the relationship between two or more variables [20].  [21] website and photovoltaic geographical information system (PVGIS) [22], which provide solar energy resource information and photovoltaic energy calculations for various regions.

SIMULATION RESULT AND DISCUSSION
The main objective is to test which neural network and architecture are the best to estimate the solar irradiation amount based on the measured data such as clear sky solar irradiation, temperature, sun height and others. To this end, a simulation is in order to test the performance and to judge the best architecture. Hence, an error metric is needed for this. For this, the error should be optimised. The accuracy of the considered models was tested by calculating the normalized mean squared error (NMSE), the root mean square error (RMSE), the normalized root mean squared error (NRMSE), coefficient of correlation (R) and the coefficient of determination (R 2 ). The next step consists of simulating two and three inputs using different neural network architecture and configuration. The objective is to make a full comparison between the networks. The error metrics were used for this comparison. The methodology consists of dividing the time series into training and testing sets. The testing one consists of selecting random days to collect 36 random days from the year and the left 11 months is chosen as a training set.

Case of 2 inputs
In this case, we have selected the clear sky and the top of atmosphere solar irradiation as inputs and the global solar irradiation amount as an output. The results are shown in Figure 3. Moreover, error metrics comparison between estimated and measured data is presented in Table 1.
From these tables, we can see clearly that the measured and estimated data are almost the same. Moreover, in the training phase, the selection of one layer with multiple neurons give the highest R 2 values for all the types of the used networks. In addition, in the testing phase, the same results were obtained, the selection of one layer with multiple neurons gives the best results compares to the multilayer and neurons selection.

Case of 3 inputs
In the same manner, we have selected three inputs namely; temperature, sun height, wind speed the obtained results are summarized in Table 2. From these results, we can confirm what we found in the case of two inputs. Most likely, the choice of one layer with multi neurons gives the best results compared to the case of multiple layers with neurons.

Comparison
In order to evaluate and perform the goodness of the propose ANN for the hourly solar irradiation estimation models, a comparison section is needed between the existing models in literature and the proposed models in this paper. To this end, 14 models have been tested against our model, which are summarized in Table 3. The main objective is to test R 2 values and compare them to the one obtained using our model.
These models have used MLP, SVR, ANFIS, and RBFNN for the estimation of hourly solar irradiation, in more details single models in [27]- [35] clustering and ANFIS in [29], [35], [36] and hybrid models with algorithm optimization in [27]- [29]. The R 2 value was selected as the error metrics for the goodness of each model. This result proves clearly the robustness and the goodness of the proposed model to estimate the solar irradiation time series with an R 2 equals to 97.24 which is high compared to other models. From the results of the case of two and three inputs, we can conclude that the optimal configuration to be used in hourly solar radiation estimation depends on several factors. Moreover, for the suitable selection of layers and nodes, some points should be considered; a) Experimentation is often used to work better for a specific dataset. Generally, we cannot analytically calculate the number of layers and neurons for real-world applications since their model is depend on several parameters. b) Another assumption for estimating the best configuration can be done using intuition. The number of layers can be built up the hierarchy by increasing the level of layers and increasing the neurons for each layer for each step. This deep hierarchical modeling is used to solve the prediction problem. Generally, intuition is often mixed with experience in order to enhance the obtained results. c) Another aspect that can use deep learning to overcome the problem of choosing the best configuration for neural networks. This learning can be a heuristic approach using random forest and Stochastic Gradient Boosting for the upper-bounding of the used neural network architecture. d) Automated searching can be used as the best technic to be used to determine the suitable number of layers and neurons, these can be done using several strategies such as; i) trying random configurations of layers and neurons per layer and ii) trying to use meta-Heuristic algorithms such as genetic algorithm or Bayesian optimization to test the optimal number of layers and neurons based on the optimization of pre-defined criteria. For big data, we can use hybridisation and combination between several methods in order to get the optimal number of layers and neurons.

CONCLUSION
This study focuses on using ANNs to develop artificial intelligence models for predicting global solar irradiation. Astronomical and meteorological parameters are utilized to estimate solar irradiation in Laghouat, Algeria. The best model is chosen based on its predictive accuracy. We have examined the possibility of estimating hourly global solar irradiation from several models by entering astronomical and metrological parameters using different neural networks. We tried several combinations of the input data and netwroks configurations. We found that the combination of tow input (the clear sky and the top of atmosphere solar irradiation) with 12 neurons of the hidden layer is the one that gives the best results, for this combination, the correlation coefficient between the global solar irirradiation measured and that estimated is 97.24% for the test data. It was concluded that this model may be preferred for estimating solar irradiation intensities for the studied site and for other places with similar climatic conditions. Future works can test another type of neural networks, namely; the hybrid combination between the models. The use of optimisation techniques can be also implemented in order to optimise the number of neurons and layers for optimal results.