Predictive fertilization models for potato crops using machine learning techniques in Moroccan Gharb region

ABSTRACT


INTRODUCTION
The two most important and fundamental resources for life on earth are soil and water. Morocco's soils and rivers are becoming more and more deteriorated, and this deterioration is accelerating. Due to its rich soils and easy access to water, The Gharb (Morocco) is widely renowned for its intense agriculture. However, after extensive use of these resources, the quality of these soils and rivers should be evaluated. The Atlantic Ocean has a significant impact on the Gharb's climate, which is characterized by a sub-humid bioclimatic zone with high air humidity in the winter and high temperatures in the summer.
Various factors can affect fertilization for optimal tuber yield, including the type and quality of the soil [1], [2], organic fertilizers [3], [4], previous crops [5]- [9], weather [10], irrigation [11], timing and location of the applied fertilizer [12], pests and diseases [13], and genetic factors. For instance, soil with high organic matter content tends to retain nutrients better and provide a more suitable environment for microbial activity, which aids in nutrient availability. The use of organic fertilizers also enhances soil quality, improves plant nutrient uptake, and reduces environmental impacts compared to synthetic fertilizers. Furthermore, previous crops, weather patterns, and irrigation practices influence the nutrient cycling and availability in the soil, ultimately impacting crop growth and development.

Int J Elec & Comp Eng
ISSN: 2088-8708  Predictive fertilization models for potato crops using machine learning techniques in … (Said Tkatek) 5943 In addition to these factors, various other factors can influence the growth and development of crops, including day length, photoperiod, water availability, intercepted radiation, air temperature, precipitation, root development, and crop management. These factors interact in complex ways, making it challenging to optimize crop growth [14], [15], and development for optimal yield. However, understanding the various factors that influence crop growth and development is critical for designing effective fertilization and crop management strategies [1], [2], [16]- [19].
Growers frequently over-fertilize due to the potential financial loss from under-fertilizing [20]. While nitrogen (N) and phosphorus (P) can both contribute to surface water eutrophication [21] and nitrate pollution [22], respectively, K has no documented negative effects on freshwater or drinking water quality. There have been attempts to combine fertilizer trial findings using multilevel modeling that incorporates soil, climatic indices, and management factors [23] or meta-analysis for determining the optimal nitrogen (N) for specific soil texture and pH groups [24]. Meta-analysis is a statistical technique that involves pooling data from multiple studies to draw conclusions about a specific research question. Even in cases when field trials were able to locate nutritional maxima, these maxima cannot be extrapolated to settings other than those of the specific studies [25].
Fertilizers are the primary means of plant development, according to El-Aziz et al. [26] and Cao et al. [27], and they are given to the soil to enhance natural growth. Each of the three components that make up NPK-nitrogen, phosphorus, and potassium-is crucial for the growth of plants. Applications for smart agriculture can employ the assessment of ground cover proportion to treat crops in an efficient manner [26], [27]. Table 1 lists the three major macronutrients and their roles, which are thought to be crucial for plant survival and development. Even if the quantity and quality of experimental data are continually increasing, researchers are still unable to integrate, evaluate, and make the most educated conclusions from it. A newer technique called machine learning can help in finding patterns and rules in massive amounts of data. Bypassing intermediary processes that a mechanistic modeling system would otherwise clearly describe; the technology produces predictions based only on input data [28].
In this study, we have proposed that the primary factors influencing fertilizer requirements for potatoes are genetics, environment, and local land management practices. To predict the economic and agronomic optimal doses of fertilizers, we utilized various machine learning algorithms including k-nearest neighbor (KNN), linear support vector machine (SVM), naive Bayes (NB) classifier, decision tree (DT) regressor, random forest (RF) regressor, and eXtreme gradient boosting (XGBoost). The aim was to determine which model is the most effective in predicting the N, P, and K requirements for potatoes. To achieve this objective, we developed several machine learning models and evaluated their performance. The main focus of this study was to forecast the N, P, and K requirements for potatoes using machine learning algorithms.

METHOD 2.1. Data set
The process of data collection is crucial as it serves as a foundation for progress. In order to gather data, one must determine the appropriate source, which could include existing files or the internet, where a web scraping tool can effectively extract large amounts of data. For our research paper, we will be obtaining data from both the web and the original database owner, Kaggle-a division of Google LLC. Regardless of the topic, data collection is typically the primary and most important stage. Table 2 displays the databases we gathered for our research.

Summarizing data
A correlation matrix is a table that shows the correlation coefficients between different variables. The relationship between two variables is represented by each cell in the table. A correlation matrix can be used as a diagnostic for further research, as an input for a more complex analysis, or to summarize data. Figure 1 displays the correlation coefficient for six features. Google Colab was used for our research. Colab

Training models
We do a correlation study between variables prior to developing the model. The coefficient of correlation, shown in Figure 1, is an examination of the connection between independent variables (6 features). Some characteristics have a high association with others, which may be noticed intuitively. However, this is merely a linear connection analysis, which may not explain how characteristics interact. As a result, more complicated prediction models are needed, and many different machine learning models are covered in the sections that follow. Six machine-learning models were trained to derive an optimal model: KNN, linear SVM, NB classifier, DT regressor, RF regressor and XGBoost.

XGBOOST algorithm
XGBoost, a scalable tree boosting method that has been extensively used in Kaggle's Higgs sub-signal identification challenge, was introduced by Chen and Guestrin [29]. It has recently drawn a lot of attention due to its exceptional effectiveness and excellent forecast accuracy. In actuality, XGBoost is an improved version of gradient-boosted decision tree (GBDT) [30], a classification and regression algorithm that consists of multiple decision trees. But XGBoost differs from GBDT in a few ways. First, whereas XGBoost adds a second-order Taylor expansion to the loss function, the GBDT method utilizes the firstorder Taylor expansion and applies normalization [31] in the objective function to minimize model complexity and prevent overfitting. Unlike gradient boosting, which operates through gradient descent in function space, the GBDT approach has these distinct characteristics, XGBoost establishes the link to the

5945
Newton Raphson method using a second order Taylor approximation in the loss function. An illustration of a general unregularized XGBoost algorithm is the following: Assume a dataset is D={(xi, yi)} (i=1, 2) and a model with k trees is trained or learnt. The model produced the following result (̂i): where ( ) is a regression tree and F is the hypothesis space: ( ) is the leaf node of the x -th sample in (2) and is the leaf score. The anticipated outcome of the t -th iteration is: Therefore, the objective function is The complexity of the model is represented by Ω(ft), and L is the loss function. The letter stands for the score and for the number of leaf nodes.
The second-order Taylor expansion simplifies (5): The above analysis indicates that the following describes the final objective function: After optimizing the objective function, the best result is:

Evaluation of model performance
The coefficient of determination (R 2 ), mean absolute error (MAE), and root-mean-square error were always used to evaluate the models' ability to predict outcomes (RMSE). In Table 3, the models are succinctly described. Several accuracy measures in machine learning and statistics may be used to evaluate the prediction model's error rate. Comparing the actual target with the projected one and describing the model's errors and capacity for prediction using metrics like MAE, MSE, RMSE, and R-Squared are the main concepts behind accuracy evaluation in regression analysis.
Regression analysis frequently evaluates model performance and prediction error rates using the MSE, MAE, RMSE, and R-Squared metrics. MAE, which is calculated by averaging the absolute difference over the data set, represents the variation between the original and projected values. When the average difference across the data set is squared, the mean calculating error (MSE) is the difference between the original and forecasted values. The square root of the MSE is multiplied by the error rate to produce the root mean squared error (RMSE). We ended with the coefficient R-squared (coefficient of determination) measures how closely the values match those of the starting points. Values between 0 and 1 are given percentages. The model is better when the value is higher. These formulas can be used to calculate the measurements listed above:  Table 4 displays our findings for the six machine learning algorithms, including MSE, R 2 , MAE, and RMSE (KNN, SVM, NBC, DT, RF, and XGBoost). In order to make the best choice, we compare and discuss the outcomes of these machine learning algorithms in this section. Then, with the aid of internet of things (IoT), we put our experiment to the test in the field. The effectiveness of a predictive model is tested or evaluated using a set of unobserved data. The term "goodness of fit" describes how closely the model's predicted values match the actual or observed values. Overfit models are those that perform well during training but poorly during testing, whereas underfit models perform poorly during both training and testing.  Table 4 shows the results about MAE, MSE, RMSE and R 2 . It is obvious that the three of the models (DT, RF, and XGBoost) outperform the others, with MAE, MSE, RMSE and R 2 , particularly XGBoost (MAE=0.90, MSE=0.93, RMSE=0.96 and R 2 =0.97). The accuracy of the three linear models (KNN, SVM, and NBC) is, however, weak, with all values. This is also consistent with the project's current condition.
According to the aforementioned research, the XGBoost model has the greatest R 2 value, as well as MSE, MAE, and RMSE values. Overall, the XGBoost model outperforms the other machine learning models. As a result, it is chosen as our algorithm machine learning to be hold in raspberry pi3. Machine learning models may replace statistical models in the context of precision agriculture as enormous amounts of data are compiled into observational data sets and recommendations for fertilizer are made. Since reliable future weather data for the growth season are not accessible, combining previous weather data was a successful technique to evaluate model performance under real-world conditions. Additionally, we concentrated on using readily accessible data obtained from regular investigations as predictors rather than models of fundamental processes. Our model might be used to maximize any biotic component other than fertilizer, such as planting density or growing season length.

Hardware implementation
Due to the limitations of the conventional approach, which involves testing the soil in a lab and then informing farmers to start fertilizing the field. This study suggests an IoT system that notifies the farmer after monitoring the nutrients present. Figure 2 depicts the automatic fertilization process used by our system. It may be challenging to manage the fertilization program at extremely low anticipated N, P, or K doses because farmers frequently believe that the cost of over-fertilization is negligible in comparison to the cost of under-fertilization.

Figure 2. Schema of IoT implantation
After taking measurements of temperature, humidity, soil moisture, nitrogen, phosphorus, and potassium from the sensors. The data will be sent to raspberry pi3 to be analyzed with our algorithm XGBoost to take decision of the name and the exact quantity of fertilizers. Then a notification will be sent to the farmer. It required a lot of new technology to integrate this application with an internet connection, such as sensors and Arduino, such views, on the one hand, it would be extremely crucial to create an application that allows this item to be operated remotely. We aim to improve our results by utilizing a combination of methods, including genetic algorithms [32]- [34]. The versatility of drones makes them a valuable tool for agricultural purposes, particularly in areas where infection risks are high, as they allow for efficient and safe remote interventions [35].
In the future, we may be able to generate new concepts for expanding our work, such as a smart urban agricultural service concept based on an open IoT platform [36], [37]. Using an open-source IoT platform (NodeMcu, Node-Red, and message queue telemetry transmission) [38]. As an automated instrument for monitoring water availability that can assist the farmer in monitoring the farm [39]. There's also a low-cost wireless sensor network (WSN) technology for detecting soil, environmental, and crop characteristics that, when properly analyzed, can be used in conjunction with weather forecasts to determine future agricultural operations based on agronomic models built into the software platform [40]. We may also use the waterfall model technique to create an application for automatic schedule-based irrigation distribution and monitoring to reduce water loss [41]. IoT combined with a fiber capillary irrigation system that calculates climatic need depending on weather conditions may also provide precise irrigation [42]. Using IoT and machine learning in irrigation is very important to minimize water loss [43].

CONCLUSION
The assessment of soil nutrients on a regular basis in the agricultural field is challenging owing to manual testing in laboratories. It causes farmers to be careless with the nutrient levels in their soil and to use fertilizer at the wrong time and with the wrong quantity. The suggested system informs farmers of the insufficiency of important soil nutrients, such as nitrogen, phosphorous, and potassium, through SMS, utilizing the devised NPK sensor and machine learning. Experimentation is carried out in order to comprehend the functionality and inform the intended purpose of the developed IoT system. Based on the results of the experiment, it is obvious that the suggested system is a low-cost, accurate, and intelligent IoT system that automatically informs the farmer about the fertilizer and the quantity to be applied at the appropriate time through messages.
Many food producers are trying to manage agricultural hazards such as disease and pests, which are exacerbated by climate change, monocropping, and increased pesticide usage. It is critical to detect problems as soon as possible. With the help of artificial intelligence (AI), we can detect diseases and pests before they are detectable by visual inspection and that helps with the increase of production. This study evaluated machine learning approaches as an alternative to the statistical models or meta-analyses that are often used at the regional level to make recommendations for potato fertilizer at the local level. To customize machine learning models with particular cultivar traits, soil properties, weather indicators, and the amount of nitrogen, phosphorus, and potassium fertilizers applied as predictive variables, an extensive dataset of field trials was utilized.