Comparative study on machine learning algorithms for early fire forest detection system using geodata

Received Feb 27, 2020 Revised Apr 25, 2020 Accepted May 10, 2020 Forest fires have become a great risk for countries. To minimize their impact and prevent this phenomenon, scientific methods have emerged. Notably machine learning algorithms and decision-making Geographical Information Systems. Therefore, a competitive spatial prediction model for early fire forest detection system using geodata can be proposed. This model can help researchers to predict forest fires and identify risk zonas. System using machine learning algorithm on geodata will be able to notify in real time the interested parts and authorities by providing alerts and presenting on maps based on geographical treatments for more efficacity and analyzing of the situation. This research extends the application of machine learning algorithms for early fire forest prediction to detection and representation in geographical information system (GIS) maps.

Using machine learning algorithms, a predictive model based on geographical and meteorological explicative variables can predict fire propagation area from a localized fire one. Several machine learning algorithms are used in our approach and compared following a regression of data collected from control zonas. Over the past years, several studies have been conducted on fire detection. An adaptive flame recognition and segmentation algorithm was used to detect fire detection in large buildings [10]. An image processing method was tested to detect smoke in videos [11]. The field of optical remote sensing has seen much progress. Object detection from images has become more accessible [12]. In this work we are using different machine learning algorithms for forest fire prediction. The first one is the Support Vector Machine which is a supervised classification model. The regression method is considered to be more efficient and more suitable for forest fires given the division into clusters of all the areas likely to be affected [13].
We also make comparison with decision trees and neuronal networks which are widely used in our context and that several recent studies have shown their performance compared to other methods [14]. Hybrid Methods are suitable for forest fires because they use simultaneously the concepts of classification and regression: naive Bayes and decision trees which increases the precision of this method. A mobile agent in a wireless sensor network could be used to predict forest fires during their surveillance [15]. Finally, we also discuss event detection which requires a different method of clustering and Support Vector Machine (SVM) relating to the propagation of forest fires following a fire started [16]. Recently several hectares of forest are threatened by forest fires. This is due to several factors. We especially focus on the neglect of forest users, pollution, global warming and other environmental factors [17][18][19]. Modelling this type of phenomenon is not always an easy thing. The causes constitute non-linear vectors for the transformation into a model given the particularities and the diversity of these factors [20].
Several disciplines can come into play for the treatment of this kind of problems. As a result of the intersection of computer science; geography, geology, physics and statistics; is a means for optimizing the results obtained [21][22][23][24][25]. In particular, for forest fires and given their complex and spatiotemporal nature; machine learning algorithms prove to be the most judicious means [26]. The literature contains cases using artificial neural networks [21,27,28] random forests (RF) [29][30][31] others use support vector machine (SVM) [32], the perceptron multilayer neural network (MLP) [28,33] logistic regression of the nucleus (KLR) [34,35] Naive Bayes [36,37]. A study panorama was also studied to show the potential of each of the methods [20,31,[38][39][40]. Therefore, it is clear that the methods mentioned above are the most suitable for solving the problems of forest fires, forest fires in particular given the possibility of analysing the pixels of the images [41].
In addition, without any extraction of the entities, the classifiers directly use the input data which acts directly and positively on the accuracy of the classification. For much more complex problems, system performance can be improved by using learning-to-learn (DL) for the impressive results that can be obtained [42,43]. This deep learning goes further than the use of imagery to also reach the recognition of objects, sounds which will clearly help in optimizing the prediction presented for our problem of forest fires [43]. The convolutional neural network (CNN) is one of the most formidable deep learning algorithms for forest fires [44,45] characterized by a better classification of remote sensed images [41,46] as well as cartography sensitivity to terrestrial translations [47]. Unfortunately, none of these studies has evaluated CNN's performance in predicting forest fires. The first law of geography [48] focuses on the pixels, on the other hand for forest fires each pixel of fires is a spark, in a span of time the pixel can generate adjacent pixels [41]. The performance of the proposed model tested using Mathworks and ToolsBox, which is an environment for the construction and evaluation of machine learning algorithms.

METHOD
This study used the combination of GIS and algorithms of machine learning to detect or predict a spatiotemporal dynamics of fire Forest Area vulnerability in the northeastern region of Portugal. Northern Portugal is the most populous region in Portugal, ahead of Lisboan, and the third most extensive by area. A cartographic representation by fuzzy surfaces for a forest region was developed and evaluated by comparing the ground truth for two forest parameters: the basal surface and population. The representation based on Voronoi algorithm and blurred surfaces witches makes better estimation of these variables than a conventional thematic map as shown in Figure 1. Since the representation based on blurred surfaces offers the possibility of highlighting local variations and of representing the borders between forest types such as transition zones, it seems that it constitutes a representation of the real world which is more realistic and more useful than conventional thematic maps. As a result, system users will be able to analyze data that is closer to actual field conditions. Classifications are one of the major problems that many researchers face when working on common business problems in all sectors. In this article, we will compare three major techniques among many, Random Forest, SVM and KNN.
-Random Forest is a collection of decision trees applied to avoid the instability and risk of overtraining that can occur with a single tree. It consists in suppressing the decision nodes without reducing the overall precision of the tree [49]. Characterized by an adjustment of only two parameters which are the number of trees and the set of attributes to be chosen during the construction of each node, which simplifies the generation of decision forests [50] -Support vector machines are a classification method that transforms a linear problem into a higher dimensional space entity. They manage non-linear decision limits and the application of limit cases allows them to manage missing data [51]. For a binary classification on the data, a classification hyperplane is used for sampling: ( 1 , 1 ), ( 2 , 2 ), ( 3 , 3 ), … . , ( , ), where ∈ , ∈ {−1, +1}, and the vector 1 is the vector directly created by some features of the sample [35]. The key of the SVM algorithm is to find a function F. So that X, apart from the sample, can obtain the corresponding Y by F after the training of the sample, it is then possible to find a hyperplane indicated by F after the training; it can divide the learning samples into positive and negative categories and then separate the other X from the sample. If the data is not linearly separable, the algorithm acts by mapping the data to a higher-dimensional feature space adopting a non-linear kernel function Φ (X), and then an optimized hyperplane is produced in the same space. The algorithm can be written as below. In this algorithm, ∅( ) + = 0 defines the separating hyperlane, w is normal vector of hyperplane, b is offset of hyperplane. The C > 0 is the penalty parameter of the error term and w are the weight coefficients of the hyperplane.
The K-NN algorithm can be used to find the k training samples closest to the target object being taught. It finds dominance from the k learning samples; then assign these dominant classes to the target object, where k is the number of training samples. The basic element of K-NN is that all samples have the same properties when they are classified in the same class in functional space, this class comprising the k closest samples [52]. In which Xu belongs to the category of (1). The machine learning methods defined above are applied according to the model below in Figure 2. The models are developed and tested by using Mathworks ToolsBox, which is an environment for building and evaluating machine-learning algorithms.

RESEARCH METHOD
The FWI system is our resource data of the areas burned during the fires between 2000 and 2003 in Portugal. They contain a clear description of the climatic conditions. These data are difficult to collect from local sensors available in Portugal given the number of stations. They also contain additional time values such as days, months, and coordinates of burned areas. The calculated values of the indices by the FWI system are a direct indicator of the intensity of the fire. By examining the data, we can say that when the wind blows around 15 km / hour, the risk of fire is high, for example.
Our method is mainly based on division of data into several equal size classes. Each data item is treated separately. Therefore, we can use the nearest neighbor method or the average of the values in order to stop the task. Consider the output variable is the area. We find that it has a positive bias. The majority of area valuesis null. The positive tilt illustrates the majority of forest fires. The asymmetric character system is also available in other countries [53]. The constraint is to increase precision and decrease asymmetry. We add a class column as response variable, which contains two values 0 for areas of fire less than 50 ha and 1 for areas greater than 50 ha. In order to find the meaningful attribute, the correlation matrix is used. We note that the attributes DC and area have a more positive correlation with the response variable and Le RH has a more negative correlation with the output variable.

RESULTS AND DISCUSSION
In this step, we must choose the best predictive model to use. The basic comparison parameter is accuracy. The results of the different models as follows: In order to better situate the predictive machine learning models, we start by the confusion matrix which help us calculate the accuracy of the model. The formula to calculate is given below. = (True positive +True Negative)/(True positive +True Negative + False positive + False Negative). Confusion Matrix is a table shows actual vs predicted values. It is one of the easiest ways to find accuracy and also it helps to avoid over fitting. The Figure 3 presents the confusion matrix values for each ML model. RF model produces 100 % of positive predictive value where the rate of both small (Class=0) and larger (Class=1) fire prediction is 100 % while the false discovery rateerror typeis 0%. For SVM and KNN, the rate of error that they produce respectively 35%, 45% for the small fire and 29%, 45% for the large fire. In consequence, the performance classification rate of the two models SVM and KNN decrease. The prediction accuracy of random forest is interesting. Hence, it reduces the noise in the dataset.  Overall accuracy of RF is 100%. This shows that RF has the best prediction results comparing with SVM and KNN which they respectively get 67.7% and 54.9% (uci). The receiver operating curve (ROC curve) will summarize the performance of the model by assessing the trade-off between sensitivity and specificity. We must always think of p> 0.5 when we draw the ROC, because we are concerned about the success rate. Area under the curve (AUC) or concordance index, is a metric of the excellent performance of the ROC curve. The accuracy results of the three methods are represented as follows: In this article, we have simulated three machine learning algorithms using data from the Montesinho Park in Portugal. According to Table 1, the classification performance of the three methods is as follows: the random forest is at 100% while the SVM provides 74% and K-NN offers 58% which represents a limit in terms of classification.

CONCLUSION
The field of data science is booming. This pushes researchers to develop increasingly complex problem-solving methods. Our approach based mainly on extracting data from existing databases used three different machine learning algorithms. Between the K-NN Support Vector Machine and RF algorithm we have shown that the K-NN has the best accuracy. This algorithm has a set of data detection and recognition assets which makes spatial manipulation much easier for the detection of at-risk or burnt areas. The choice of the algorithm of the highest precision is justified by the simulation of the different algorithms and the comparison of the experimental results obtained. In addition, the system, using data collecting sensors, generates large data which is analyzed by various machine learning methods cited and compared in this study to predict with high accuracy the amount of land burned in a forest. In the northeast region of Portugal. The use of technology is then the strength of this prediction system. Geographic information systems and machine learning can help decision makers to minimize the natural and human damage caused by forest fires. The use of these methods is increasingly optimizing the treatment of this phenomenon and those of its kind. The players in the sectors in question are then invited to join hands in fighting against late interventions.