Prediction of atmospheric pollution using neural networks model of fine particles in the town of Kennedy in Bogotá

Received Feb 17, 2020 Revised Jun 1, 2020 Accepted Jun 14, 2020 This work shows an application based on neural networks to determine the prediction of air pollution, especially particulate material of 2.5 micrometers length. This application is considered of great importance due to the impact on human health and high impact due to the agglomeration of people in cities. The implementation is performed using data captured from several devices that can be installed in specific locations for a particular geographical environment, especially in the locality of Kennedy in Bogotá. The model obtained can be used for the design of public policies that control air quality.


INTRODUCTION
Air pollution is an acute threat, it is a phenomenon that has a particular impact on human health. The changes that occur in the chemical composition of the atmosphere can change the climate, produce acid rain or destroy the ozone layer, all phenomena of great global importance. The World Health Organization (WHO) considers air pollution as one of the most important global priorities [1].
The use of non-renewable resources in the production of energy, such as oil or coal, generates important emissions of sulfur dioxide (SO2), carbon monoxide (CO), among others. On the other hand, the means of transport used in everyday life constitute another alarming source of contamination. A large part of these pollutants emitted into the environment is generated by automobiles [2].
In consequence, three main steps are proposed to be followed to address air pollution problems. The definition of air quality parameters that need to be controlled; the monitoring of these through the use of specific hardware and software; and the adoption of solutions aimed at reducing the concentration of harmful substances and ensuring clean long-term air [3]. As for air quality parameters, these are determined by the environmental policies outlined by the city government. The monitoring, for this experiment, is carried out by the surveillance stations that are installed in different points of the city. In the case of Bogotá, despite the efforts made by environmental authorities, academic institutions and citizens in general, the capital city of Colombia is today one of the most polluted cities in Latin America. One of the pollutants of greatest concern in the city is the particulate material, since its levels frequently exceed the air quality standards [2].
The Secretaria Distrital del Ambiente of Bogotá has the "Red de Monitoreo de Calidad del Aire de Bogotá -RMCAB", which comprises 13 fixed monitoring stations and a mobile station operating in different parts of the city, equipped with cutting-edge technology to allow continuous monitoring of the concentrations of particulate matter (PM10, PST, PM2.5), pollutant gases (SO2, NO2, CO, O3) and weather variables precipitation, wind speed and direction, temperature, solar radiation, relative humidity, and barometric pressure [2]. Figure 1 establishes a comparison of a human hair vs PM10 and PM2.5 to observe the importance of the particulate matter. At the global level, different sources of information have been used to mitigate and predict levels of air pollution and one of the most promising tools are the artificial neural networks, since these allow analyzing a large amount of data and making predictive models that allow forecasting future pollution levels. In this work, an artificial neural network was chosen for the prediction of statistical data and the result was an analysis of a prototype in a machine learning model for the prediction of air pollutants of particulate matter (PM) in an area of Bogotá (Kennedy).

RESEARCH METHOD
This work employs neural networks since they are widely used in the forecasting process in different areas, as can be seen in [4][5][6]. To carry out the design of this artificial neural network (NN), the following methodological steps were followed: -Selection of input variables. -Normalization.
-Architecture selection: amount of NN for hidden layer.
-Selection of the activation function.
-Selection of the learning algorithm.
-Training and validation of NN.

Selection of input variables
Data provided by the Kennedy air quality monitoring station, as shown in Table 1 was used for the selection of input variables; this station is under the administration of the Secretaría Distrital del Ambiente of Bogotá [7], through the air quality monitoring network of Bogotá RMCAB, see Figure 2. Especially, in this paper is under study the data taken from the monitoring station in Kennedy (delimited in red in Figure 2), considering the value of IBOCA (Índice Bogotano de Calidad de Aire).
The information from this station was compiled through the monthly reports published by the Bogota Air Quality Monitoring Network, which specifies the main atmospheric pollutants that the monitoring station is capable of measuring, and the average data of each pollutant, see Table 2; this paper is particularly focused on PM2.5 particulate material. The public information available from the Kennedy monitoring station was consolidated until 2018; one of the first steps taken with these data was its normalization, a procedure that consisted of cleaning the data of empty fields so that the model did not present errors.
According to the available data of Table 2 the input variables are: month, year, maximum value of PM2.5 [ / 3 ], number of exceedances to the standard of 24 hours (24H), the percentage of valid data, the percent of IBOCA favorable, moderate and regular (the sum of these three must be equal to 100%), and finally is used the data associated to the number of anomalies. The out variable corresponds to the average value of PM2.5. Table 3 presents the general statistics of the training data.

Normalization
There are different scales and ranges within the collected data, although, the model could converge with the information in this way, making the training more difficult and making the model dependent on the units used for the input [8]. In addition, to normalize the values is used the Standard Normal Distribution method (1), which allows to reduce any normal distribution to the standard normal format, since it has an arithmetic average and unique standard deviations, whose values are zero and one, respectively [9].
where corresponds to the value of the variable to be standardized, represents the arithmetic average and the standard deviation of the variable, this ensures that there are only values within the range of 0 and 1.

Architecture selection
Cardinality selection of the hidden layers (if there are more than one) seems to be an unclear issue when designing a neural network (NN), too many units can lead to low generalization capacity. On the other hand, few units can leading to the NN do not have sufficient capacity to solve the problem [10]. Regarding the network topology, determining the number of layers that integrate it and the number of hidden neurons to be included in each layer is a complex task that directly affects the generalization capacity of the model. Since every neural network necessarily has an input layer which receives external stimuli, the problem is limited to establishing the number of extension of the hidden layers [11].
Although it has been demonstrated that the universal approaching property of MLP (Multilayer Perceptron) network functions requires a maximum of two hidden layers, in most cases a single hidden layer is sufficient to achieve optimal results. Lippmann [12] considers that networks with a single hidden layer are sufficient to solve arbitrarily complex problems, provided that the hidden layer includes at least three times the number of input nodes. Meanwhile, Hecht-Nielsen and Lippman apply an extension of Kolmogorov theorem to show that a network with a single hidden layer integrated 2 + 1 neurons and transfer function of continuous, non-linear and monotonically increasing is sufficient to compute any continuous function of input variables [13].
Usually, ad hoc rules are used to determine the number of hidden neurons in each layer. Although they are not mathematically justifiable, they have shown good behavior in various practical applications. Masters [14] proposed a method that he called the geometric pyramid rule, which is based on the assumption that the number of neurons in the hidden layer must be less than the total number of input variables, but greater than the number of output variables. It is considered that the number of neurons in each layer follows a geometric progression, such that for a network with a single hidden layer, the number of intermediate neurons must be close to √ • , where is the number of variables of input and the total output neurons; this project takes a total of 9 input variables and one output; thus, 9 neurons were defined in the hidden layer according to [10].

Selection of the activation function
In both artificial and biological neural networks, a neuron not only transmits the input it receives. There is an additional step, an activation function, which is analogous to the action potential rate [15]. There are many activation functions, for this project is used the activation function called Rectified Linear Unit abbreviated as ReLU, which is defined in (2) and represented in Figure 3. The superiority of ReLU is based on empirical research, probably because it has a more useful range of response capacity [16]. In other words, ReLUs allows all positive values to pass without changing them, but assign all negative values to 0. Although there are even more recent activation functions, most of the current neural networks use ReLU or one of its variants [15]. In fact, any mathematical function can be used as an activation function. Suppose that Figure 3 represents the activation function (ReLU, sigmoid or any other), to define the activation function of the neural network model, simply using the one provided by the TensorFlow library [16].

Selection of the learning algorithm
For modeling the Artificial Neural Network it was used the Keras library [17], which is written in Python, which supports working together with TensorFlow [18] and allows the modeling of artificial neural networks. Keras has two types of models; for this project, it was decided to use the sequential type, defined as a pipeline with its raw data entered in the lower part and the predictions that come out in the upper part. This is a useful conception in Keras, since they were traditionally associated with a layer can also be divided and added as separate layers [19]. The compilation requires that the parameters of the network be specified, as well as the optimization algorithm to be used to train the network and the function used to evaluate the network [19]. Figure 4 shows a fragment of the neural network modeling code using as mentioned above, the Keras and Tensorflow libraries together.
For this case, it was decided to use the RMSprop (Root Mean Square Propagation) optimizer, which is an algorithm used for complete batch optimization [20]. RMSprop tries to solve the problem of the possible wide variation in magnitude of the gradients. Some gradients can be small and others huge, which is a very difficult issue trying to find a unique global learning rate for the algorithm. This adjustment is useful for the support points and flat segments, since large enough steps are taken, even with small gradients; the step size adapts individually with time, so that learning is accelerated in the required direction [21].
Another parameter assigned for the compilation is the loss function, which indicates the value of the prediction error which consists of the sum of all the errors obtained; the metrics value was also defined, which consists of the list of metrics to be evaluated by the model during training and testing. Figure 5 shows a code fragment with the configuration of the aforementioned parameters. Making the respective configurations of the neural network as result it is obtained the report shown in Figure 6, where the layers are displayed, the number of neurons in each layer and the total number of training parameters.

Training and validation
In the context of neural networks, learning can be seen as the process of adjusting the free parameters of the network [22,23]. Starting from a set of random synaptic weights, the learning process looks for a set of weights that allow the network to correctly develop a certain task. The learning process is iterative, in which the solution is refined until reaching a sufficiently good level of operation [24]. The goal of any training algorithm is to minimize the mean square error MSE (or any other loss function), but experience has shown that networks tend to over adjust data [25]. For this reason, the collected data were divided into 80-20: 80% of the data was randomly divided to be used as training data for the neural network and the remaining 20% for the testing or validation stages.

RESULTS
Two different machines were used for modeling the neural network: the first is a Windows desktop computer with 8GB of RAM and AMD Ryzen 2600 processor with six cores at 3.40GHz; the development environment of Python 3 and Junyper was installed on this machine. In the other development environment, a Google Collaborative instance was managed where resources are handled according to availability in a virtual machine with Junyper. The result of the error in both machines was almost the same; first machine MSE 3.56 and the second MSE of 3.67.
During the training of the model, it is a suitable practice to verify the iterations while training the artificial neural network, this to stopping the iterations (early stop), since this prevents the model from being overtrained more than necessary. Initially, it was defined 5000 epochs for training process, Figure 7 shows that after approximately 3000 epochs the error does not decrease, instead, it remains stable [16]. In addition, for the first iterations the validation error is less than the training error, while for the last iterations the opposite occurs.
In the result of Figure 7, the training process stops until finalizing the total number of epochs, to accelerate the training process as well as to avoid the overfitting the variable "early stop" is used to verify every 50 epochs if the error was reduced, then the training is stopped, otherwise, it would continue. The result using this option is presented in Figure 8, where it can be observed that at the end of the training process the validation and training error tend to be the same. Finally, Figure 9 shows the result of the prediction of PM2.5 particulate material removing outliers. This result shows that the prediction is achieved for a limited range of data, which may be related to the low amount of data available for training.

CONCLUSION
Currently, there is a suitable amount of libraries that allow the design and modeling of artificial neural networks, where it has been shown that Machine Learning techniques are a suitable tool for modeling prediction and classification problems. In the first stage of the project, from the information collected, the empty fields were cleaned to avoid errors and then normalized. This aspect was of importance to achieve a satisfactory result. In this regard, at the development stage, the standard normal distribution method was used to normalize the input data.
As mentioned above, the data was divided in such a way that records were kept for both training and testing; after making the adjustments on the training of the model it was possible to reduce the final MSE from 30 to 3.56 in the prediction of the average concentration of PM2.5 pollutants. The design of the neural network was based on the rule of the geometric pyramid of the hidden layers obtaining a suitable result for MSE. When there is not much training data, according to literature an alternative is to use a small network with few hidden layers to avoid overfitting. Another useful alternative to avoid overfitting is the early stop, which was considered in this work. The prototype of the neural network applied to predict the PM2.5 pollutants, is an example of the application of Machine Learning methods in the different aspects that affect human beings. The result of this work can be improved to achieve a better prediction for a large value of data.