Comparing machine learning and deep learning classifiers for enhancing agricultural productivity: case study in Larache Province, Northern Morocco

ABSTRACT


INTRODUCTION
Agriculture is the most common occupation in Morocco, accounting for nearly 15% of the Moroccan gross domestic product (GDP). Furthermore, agriculture is the primary source of income for approximately 15 million Moroccan farmers, according to the Moroccan Ministry of Agriculture. Morocco's population is growing by the day, and so is the demand for plant and crop products. Therefore, the quantity and quality of food must be increased by protecting crops from plant diseases [1]. Thus, precise plant disease detection remains a critical process for farmers [2]. As a consequence, the Moroccan government proposed the green Moroccan plan (GMP) to renovate the agriculture sector by using modern technologies to enhance agricultural productivity. The biological Golden Gogi farm in the Larache Province (Northern Morocco) is our case study for this article. The objective of this farm is to produce organic farming without using agricultural chemicals. The main crop of the biological Golden Goji farm is the mint plant; there are two main issues limiting mint productivity improvement on this farm. The first issue is climate change, which has recently affected the Larache Province and harms crop productivity. The second problem is the presence of  Figure 1. Regrettably, these issues cause disease in the mint plant. For this reason, the farmers of the biological Golden Gogi farm must make a round of checking daily to detect diseases of the mint plant and the biotic factors. This traditional technique takes a long time and gives unsatisfying results by significantly reducing the mint productivity. Therefore, early disease detection is critical and one of the most challenging tasks in precision agriculture. With artificial intelligence (AI), the early disease detection task is possible thanks to its intelligent proprieties. In this study, our objective is to use AI techniques to detect the mint plant disease at an early stage, as well as to minimize the risk of massive diseases on the biological Golden Gogi farm. In plant disease detection, the most popular algorithms used for identifying diseases from plants are machine learning (ML) and deep learning (DL) classifiers due to their various advantages. ML is an important subset of AI that refers to different mathematical algorithms. The purpose of ML is to learn and analyze the data in order to make adequate decisions [3]. Another well-known AI algorithm is DL, which represents a subset of ML based on neural networks [4]. DL achieved significant results in various applications, in particular image processing. In this regard, previous works suggested ML and DL in plant disease detection. As per [5], the authors suggested five machine learning algorithms such as random forest (RF), support vector machine (SVM), decision trees (DT), naive Bayes (NB), and k-nearest neighbors (KNN) to detect tomato leaf diseases. In their experimental test, the RF gave a high performance with an accuracy of 89% versus other algorithms. In Noola and Basavaraju [6], the authors enhanced the k-nearest neighbor (EKNN) to detect the disease of corn leaves, and they got an accuracy of 99.86%. Prottasha and Reza [7], the authors developed depthwise separable convolutional neural network to determine rice plant diseases. Their proposed approach performed well compared to other convolutional neural network (CNN) architectures. In Too et al. [8], various deep learning architectures were employed to identify the plant disease. Their findings revealed that DenseNet gave a high performance in comparison with other architectures, with an accuracy of 99.75%. In this article, we will compare machine learning classifiers and deep learning classifiers to find the best classifier for mint plant disease identification at an early stage in order to assist farmers in taking appropriate measures before the plant becomes more affected. Based on the previous works, we suggest DenseNet 201, VGG 16, MobileNet, Xception, and Inception-V3 for DL and SVM, RF, KNN, logistic regression (LoR), and DT for ML. Standard performance measures like confusion matrix, accuracy, precision, recall, and F 1 score were used to evaluate the effectiveness of each classifier. The following are our main contributions to this study: − This study compares five machine learning models and five deep learning models to classify mint plant diseases in order to find the best classifier in the case of disease detection. − From literature reviews, various works reached that deep learning performed well in plant disease detection [9], [10]. Hence, this study proves and supports the previous works that DL is the best choice in analyzing the most important features affecting the detection and treatment of serious diseases versus ML since DL can perform feature extraction on its own. Therefore, our work can contribute greatly to the agricultural field by making recommendations to future researchers in this area. − From the results obtained, the best classifier is DenseNet 201. For a specific Province, this research offers an automated system for increasing mint productivity using DenseNet 201. This system will alert the farmer when the disease appears on the plant, and when biotic factors are detected on the plant's leaves. Therefore, this system decreases the number of laborers and allows farmers to easily and quickly monitor the health of their mint plants by taking the necessary steps. − Furthermore, this work will support the farmers of this province to increase their yields and reduce losses by detecting mint plant diseases at an early stage, while also improving Moroccan GDP. The organization of the article is as follows. Section 2 provides a brief description of the mint plant dataset as well as an explanation of all proposed deep learning and machine learning classifiers. Section 3 presents the results achieved by the proposed classifiers and discusses the experimental tests. Finally, the study's conclusion and future work are given in section 4.

METHOD 2.1. Study area and data acquisition
The Larache Province is part of the Tangier-Tetouan-Al-Hoceima Region (Northern Morocco), which ranks best in agricultural productivity for this region due to its large farming surfaces and the Loukkos perimeter that belongs to it. Therefore, the Larache Province provides a high percentage of national production [11]. The mint plant is a valuable and essential crop in Moroccan life, and it is abundantly consumed. For this reason, we propose an automated early detection system for mint plants to help the farmer in our region in making the appropriate measures to reduce the mint productivity losses. In this study, the mint dataset was collected from the biological Golden Gogi farm in collaboration with associational Al-Amal of the Larache Province (Northern Morocco). The dataset was manually gathered using an iPhone 7 as illustrated in Figure 2, which contained both unhealthy mints, as shown in Figure 2(a), and healthy mint, as shown in Figure 2(b). Following that, we cleaned the data (we obtained the 337 images in totality after the cleaning phase) and split it randomly into a training set (80%) and a test set (20%).
After that, we used several data augmentation methods to increase the number of images, reduce the overfitting problem and improve the generalization ability of DL models [12]. Besides this, we used the transfer learning of DL classifiers that were already pre-trained. These techniques can improve the performance of DL classifiers, particularly in small datasets, as in our case.

Machine learning and deep learning classifiers
In this section, we present the most popular machine learning and deep learning classifiers for detecting mint leaf disease to find the best classifier in the case of disease detection. Machine learning is a subset of artificial intelligence methods that allows a system to analyze data and learn from it. For this reason, Machine learning algorithms are widely used for various complex problems (e.g., classification, diagnosis, prediction, and so on) compared to traditional techniques [13], [14]. Deep learning is a subset area of machine learning that can also be referred to as deep neural networks (DNN). One of the potentials of DL is represented in its ability to extract the features and knowledge from images to predict the assumed decision. As a result, DL has seen a high level of success in plant disease detection and diagnosis [15], [16]. The popular DL algorithms are the CNN and recurrent neural network (RNN). In this study, we propose five machine learning algorithms: SVM, KNN, LoR, DT, and RF, as well as five deep learning algorithms based on CNN architecture: DenseNet 201, VGG 16, MobileNet, Xception, and Inception-V3 to classify the mint data and detect its diseases. The following paragraphs explain briefly each algorithm used.

Support vector machine
Support vector machine (SVM) is a well-known machine learning algorithm for binary classification problems, and it is based on supervised learning. Recently, other algorithms have improved the SVM's ability to handle multiclass classification challenges [17]. Furthermore, the SVM algorithm can solve both linear and non-linear issues. Hence, the SVM algorithm seeks the best hyperplane (or line) in an n-dimensional space between data classes.  ISSN: 2088-8708

K-nearest neighbors
K-nearest neighbors (KNN) is one of the simpler machine learning algorithms and is based on the concept of supervised learning. The KNN can solve classification and regression problems, enabling its use in a broad range of applications, including data mining, pattern recognition, plant disease detection, and more [5], [18]. The KNN algorithm's objective is to compute the distance between the training and testing samples in the data and then return the K-nearest neighbors based on the computed distance [19]. The Euclidean distance method is by far the most common approach taken when calculating a distance.

Logistic regression
Logistic regression (LoR) is a well-established algorithm for classification tasks and is based on a supervised learning mode. The LoR has been vastly used in various filed, especially in the computer vision aspect [20]. The concept of the LoR algorithm is to find the correlation between the categorical dependent variable and an independent variable. The logistic regression solves the binary classification issue and can support multiclass classification problems.

Decision trees
Decision trees (DT) is one of the first classifiers used in machine learning. The DT is a supervised algorithm applied in classification problems and data analysis. The structure of the DT algorithm is a tree that contains the decision nodes, edges, and leaf nodes, which facilitate efficient data interpretation [21].

Random forest
Random forest (RF) is a supervised algorithm that uses a group of the decision tree in a random manner during the training phase [22]. The ultimate goal of the RF algorithm is to create a forest of trees and take randomly part of the dataset in each decision tree. Therefore, each decision tree produces an output, and the best output is chosen as the final result based on majority voting.

VGGNet
VGGNet is named visual geometry group proposed by two researchers from the University of Oxford, and it was ranked second place in a competition that was held in 2014 [23]. The VGG architecture uses deeper weight layers on the convolutional neural network and a filter with a small size (3×3). There are two variants of VGGNet: VGG 16 and VGG 19. These variants have similar architecture and different weight layers depth. In this study, we use the VGG 16.

MobileNet
MobileNet is a deep model based on CNN that has a significant performance in many applications such as computer and mobile vision due to its benefits. The MobileNet includes the depth-wise separable convolution rather than classical convolutions used in order to build a lightweight model (DCNN). In MobileNet, each block is composed of a depth-wise convolutional layer (3×3) and followed by a pointwise convolutional layer of (1×1) [24].

InceptionNet
InceptionNet is a CNN pre-trained model. The inception algorithm, like other CNN architectures, has been widely used in image classification to solve various challenges such as prediction, diagnosis, and recognition. The goal of this architecture is to increase the network's width and depth in order to achieve high accuracy [25]. The Inception-V3 was employed in this study.

XceptionNet
XceptionNet is yet another architecture of CNN that has been pre-trained on the largest images (i.e. ImageNet database). The XceptionNet is based on the Inception algorithm, which contains depth-wise separable convolutions. This property made the Xception network stronger than the Inception model. The XceptionNet was applied in various tasks, including diagnosis and plant disease detection [26].

DenseNet
DenseNet architecture has been vastly applied in various fields, including medical diagnosis, plant disease detection, and more. Most experimental tests in previous works demonstrated that the DenseNet provided good accuracy due to feature extraction and robust feature reuse abilities [27]. The architecture of DenseNet is different from other CNN architecture, which is densely connected to each network and can learn with fewer parameters. The DenseNet 201 is proposed in this work. Hence, the flowchart of the proposed work is described in Figure 3.

RESULTS AND DISCUSSION
In our study, the machine learning and deep learning classifiers were applied to classify and diagnose the mint leaf's images to detect its disease. The ultimate goal of our research is to find the best classifier for mint plant disease detection. Therefore, in order to make our comparison, we based on the standard performance measures such as accuracy, precision, recall, F 1 score, and confusion matrix. These performance measures help in evaluating the quality of the proposed classifiers on the test set.
Classification accuracy, as shown in (1), is the percentage of mint plant images classified correctly. Precision, as shown in (2), is the ratio of positive instances that were correctly classified. The recall, as shown in (3), is the percentage of true positive instances that were correctly identified. Thus, the F 1 score, as shown in (4), is calculated as a weighted average of precision and recall. The other performance metric used in this work is the confusion matrix, which compares the classification of the model to the real classification. Where: TP is the number of healthy mints that are diagnosed correctly, FN is the number of healthy mints that are misdiagnosed as unhealthy mint, TN is the number of unhealthy mints that are diagnosed correctly, and FP is the number of unhealthy mints that are misdiagnosed as healthy mints, as shown in Table 1.

The performance of machine learning classifiers
The five proposed machine learning classifiers in this work, such as SVM, KNN, LoR, DT, and RF, were developed using the SciKit-learn library under Python language. In the learning procedure of the machine learning classifiers, the input images are resized by (150×150×3). According to the results, the SVM gave the best performance versus KNN, LoR, DT, and RF. The analysis of the obtained results is provided in the following paragraphs.  Table 2 illustrates the precision, recall, and F 1 score obtained on the test set of all suggested machine learning classifiers and offers the performance results for each class. The last column represents the confusion matrix for each ML classifier. From the results, the SVM gave the best performance in most of the precision, recall, and F 1 Score metrics in comparison with RF, LoR, KNN, and DT, as shown in Table 2.
Moreover, the confusion matrix shows that most machine learning algorithms nearly correctly classified the mint plant, except the DT model, as described in Table 2. The diagonal elements represent the correct prediction, and the non-diagonal values represent the incorrect prediction. Thus, the SVM provides the best results versus RF, LoR, KNN, and DT with 13, 14, 15, 16, and 26, respectively for misclassifications.  Figure 4 provides a chart that compares the performance of each machine learning algorithm in classification accuracy terms. The experimental results show that the SVM classifier produces the best value compared to RF, LoR, KNN, and DT with 80.88%, 79.41%, 77.94%, 76.47%, and 61.76%, respectively for classification accuracy. As a result, the SVM outperforms other machine learning algorithms in accurately predicting the majority of mint plant images.

The performance of deep learning classifiers
Deep learning algorithms such as DenseNet 201, VGG 16, MobileNet, Xception, and Inception-V3 are proposed as second classifiers in this work. We used Python language and the open-source Keras library to develop and train these classifiers. In the learning procedure of the deep learning classifiers, we resized the input images to (224×224×3), and 32 was taken as the batch size. Following that, we used the Adam optimizer method and Dropout technique. Thus, the following paragraphs present the performance obtained in each classifier. Table 3 shows the results obtained from the deep learning classifiers on the test set. It outlines the precision, recall, and F 1 score of each classifier and provides the performance of all DL models in two classes (i.e., healthy mint and unhealthy mint). The last column describes the confusion matrix for each DL classifier. The results show that all deep learning classifiers produce nearly similar values in precision, recall, and F 1 score. In comparison, the DenseNet 201 gives the highest value in all performance metrics.  4, 6, 8, 9, and 12, respectively, for misclassifications, as shown in Table 3. Hence, the diagonal values represent correctly classified instances, and non-diagonal values represent misclassified ones. Figure 5 depicts that all deep learning classifiers gave nearly similar accuracy and the best DL model is the DenseNet 201 with an accuracy of 94.12% compared to VGG 16, Xception, Inception-V3, and MobileNet with an accuracy of 91.18%, 88.24%, 86.76%, and 82.35% respectively. Therefore, the DenseNet 201 is the best classifier than other deep learning classifiers in mint plant disease detection.

The proposed automated system
According to the literature reviews, deep learning and machine learning methods have been widely used in plant disease detection. To find the adequate and the best classifier for mint plant disease detection, we aim to compare the most commonly used DL and ML classifiers in disease detection. From the results obtained, all DL classifiers outperform ML classifiers in all performance metrics. Therefore, the best classifier is DenseNet 201, with high accuracy of 94.12%, and its overfitting is very low. By comparing our findings with the results of the previous works in this context, this study validates that the DL classifiers are the best in plant disease detection [9], [10]. Thus, our automated early disease detection system will be based on the DenseNet 201, in which mint plant images are captured and then sent to our automated system for analysis. The alarm will sound when disease and biotic factors such as snails and other insects are detected on the mint plant, as shown in Figure 6. This proposed system assists the farmers of the Golden Gogi farm to take the necessary steps before the plant suffers further damage. In this manner, the farmers reduce their mint productivity losses and increase crop yields without the need for a laborious process of monitoring, which decreases the number of laborers and reduces the preparation time.

CONCLUSION
The Larache Province (Northern Morocco) is heavily dependent on agriculture. Climate change and biotic factors all have an impact on agricultural yield, resulting in massive crop production losses annually. In the biological Golden Gogi farm of the Larache Province, the farmers employ the traditional techniques to diagnose the mint plant and detect its disease (i.e., the farmers make a round of checking daily). These techniques need a significant amount of labor and time. Thus, early mint plant disease detection is a very challenging task for farmers. In this study, we compared several deep learning and machine learning algorithms to specify the best classifier for early detection of mint plant disease namely: DenseNet 201, VGG 16, Xception, MobileNet, Inception-V3, SVM, RF, LoR, KNN, and DT. The experimental tests show that the DL classifiers performed well in comparison with the ML classifiers in all performance measures used. Accordingly, DenseNet 201 is the best classifier, with an accuracy of 94.12 %, while DT is the worst, with an accuracy of 61.76%. Consequently, the automated system based on the DenseNet 201 can assist the farmers of the biological Golden Gogi farm in taking urgent decisions when disease and biotic factors appear on the mint plant. This proposed system reduces mint plant diseases and increases yields. In the future, we aim to create an overall automated system capable of diagnosing different plant types in the Larache Province in order to protect crops from plant diseases. Moreover, we will evaluate the agricultural sector's impact on Moroccan socioeconomic development.