Improved vision-based diagnosis of multi-plant disease using an ensemble of deep learning methods

ABSTRACT


INTRODUCTION
With the fast improvement of computer vision-based diagnosis approaches, deep learning techniques are widely utilized for recognizing plant disease efficiently [1]. In order to enhance the accuracy and diagnosis speed of plant disease recognition techniques, researchers designed and evaluated numerous convolutional neural networks (CNN), which provided remarkable performance in semantic understanding and visual recognition [2]. In various computer vision-based diagnosis applications, CNN is widely applied for its efficient learning capacity, which is able to learn high-level robust feature representations directly from raw images [3]. However, traditional machine learning techniques are also used in the agricultural sector for developing tools for plant disease recognition, but these techniques are time-consuming and sometimes troublesome too [4], [5]. Machine learning classifiers extract features from images with the help of an expert, which needs several cumbersome and time-consuming steps, where deep learning models automatically learn high-level features. Moreover, machine learning techniques require different further processes to solve a problem, where deep learning methods follow an end-to-end approach, which takes input data and generates end results by its own intelligence and capability. Recently, deep learning, in particular, ensemble learning approaches, has undergone a significant improvement which sharply enhanced its efficiency in several areas including agriculture, medicine, and food engineering. The stacking ensemble learning method optimizes recognition performances by amalgamating base classifiers, which can handle complex image data and provides more efficiency than a single classifier [6].
The cultivation sector is extremely challenged by diseases of plants, which is also a vital threat to the ongoing improvement of the agricultural economy [7]. Black gram, betel, Malabar spinach, and litchi are commonly cultivated plants in most countries in South Asia, and diseases of these plants seriously damage yield quantity and quality. In most times, the ensemble method outperforms single CNNs in terms of efficiency. Bagging, majority voting, and stacking are the most used ensemble techniques. Inspired by the performance of ensemble learning techniques used in several studies, this study designed a hybrid technique for diagnosing plant diseases using stacking ensemble learning. Moreover, the existing studies introduced in the literature for recognizing diseases of single plants have no ability to diagnose diseases of other plants. To meet the growing demand for efficient disease diagnosis tools and for ensuring the sustainable development of the agriculture sector, the design and evaluation of multi-plant disease diagnosis techniques are highly crucial. Image augmentation techniques (IAT) remarkably increase the generalizability of an overfitted CNN model, which also reduces the cost of data collection and assists in solving the issue of class imbalance in classification tasks. Initially, a multi-plant leaf image (MPLI) dataset is generated using three IATs from 11,243 images, which contains 44,972 images of nine different classes. Then, these images were randomly split into training, validation, and test sets, where training and validation sets were used during the training process, the test set was used for the performance evaluation of CNNs, and these sets contain 35,977, 6,745, and 2,250 images, respectively. IAT helps CNNs in controlling the overfitting issue by reducing operational costs by addressing transformations in a dataset, which also provides more robust capability via generating variations in CNNs [8].
Transfer learning techniques (TLT) are extensively utilized in state-of-the-art image recognition tasks in the agriculture sector to enhance the capability and efficiency of CNNs [9]. TLT offers several advantages, it reduces training time, enhances the performance of models, and provides good output in absence of a largescale dataset in most circumstances. Xception, VGG19, InceptionResNetV2, MobileNetV2, and ResNet50 were applied to MPLI dataset for recognizing and classifying four plant diseases using deep learning methods via TLT. The disease diagnosis capability of these CNNs was evaluated using 2250 images of the test set. Experimental studies conducted in this study show that single CNNs acquired less accuracy than our designed hybrid technique. Among the performance of individual CNN, Xception showed remarkably diagnosis efficiency than others, which obtained 97.83% test accuracy. VGG19 showed less efficiency than other CNNs and obtained 81.38% test accuracy. On the other hand, the designed stacking ensemble learning technique (SELT) acquired 1.37% more test accuracy on the same test set of the MPLI dataset. The main contributions of this article are summarized as follows. − An efficient hybrid diagnosis approach for multi-plant diseases, namely, SELT, is designed and evaluated in this research work, which outperformed single CNN models in classifying unseen images and acquired higher accuracy than other techniques introduced in several recent research works for plant disease diagnosis. The addressed hybrid approach is designed for the diagnosis of four different plant diseases with the SELT model, which reduces the cumbersomeness of recognizing four plant diseases with four different models. − A new dataset, namely, MPLI is generated where IAT was used for expanding the dataset which ultimately lays a foundation for the training phase of CNN models to learn features under several complex backgrounds of the images. − To the best of our knowledge, this is the first manuscript where a diagnosis technique is developed for simultaneously classifying black gram, betel, Malabar spinach, and litchi plant diseases, where several experiments and analyses were carried out for evaluating the effectiveness of the addressed approach. The rest of the manuscript is structured as follows. Section 2 introduces and summarizes related works. The MPLI dataset, design and evaluation process of SELT is described in section 3. Section 4 demonstrates the result of several experiments and outcomes of this research work. Finally, section 5 concludes the article and discusses limitations and feature works.

RELATED WORK
The automated diagnosis of plant diseases via images has become a remarkable interest from researchers for a few years. Several research works were conducted in computer vision using machine learning and deep learning techniques for diagnosing plant diseases. Using Levenberg-Marquardt (LM) algorithm, Sulaiman and Saad [10] proposed an artificial neural network (ANN) to classify the white root disease of the rubber tree, which obtained 89.67% accuracy with nine hidden layers. On the other hand, scale conjugate gradient (SCG) based ANN acquired 89.50% accuracy with seven hidden layers. For recognizing diseases of

5111
Harumanis mango leaf, Gining et al. [11] addressed a recognition system based on image processing (IG) which acquired 68.89% accuracy, where morphology, texture, and color future were considered for detecting leaf disease. The grey level co-occurrence matrix (GLCM) was used for performing feature extraction. Malik et al. [1] addressed a hybrid technique for detecting diseases of sunflower, where the performance of different pre-trained CNN models was compared with the proposed approach, which was built using the stacking ensemble learning approach with VGG16 and MobileNet. Wang et al. [3] proposed coordination attention EfficientNet (CA-ENet) for diagnosing apple leaf diseases, and obtained 98.92% accuracy, where ResNet152, DenseNet264, ResNeXt101, and EfficientNetB4 obtained 93.75%, 94.90%, 95.67%, and 97.27% accuracy, respectively. For diagnosing leaf disease, Hang et al. [4] addressed an improved CNN architecture using inception and squeeze-and-excitation (SE) modules, which acquired 91.70% accuracy and took 961.1 seconds for training. Compared to pre-trained CNNs, their model took less training time, but obtained higher accuracy. For the identification of wheat disease, Noola and Basavaraju [5] proposed a framework after evaluating the performance of four pre-trained CNNs such as InceptionV3, Resnet50, VGG16, and VGG19, where VGG19 performed better than others that acquired 98.23% accuracy with 15 epochs. ResNet50 obtained less accuracy than other pre-trained CNNs, 81.57%, and also consumed more training time than other CNNs with 50 epochs. Rozlan and Hanafi [12] compared the performance of pre-trained CNNs including VGG16, InceptionV3, and EfficientNetB0 for classifying diseases of the chili plant, where InceptionV3 obtained 97.67% and 98.83% accuracy, respectively, on the dataset of original and augmented images. On original images, EfficientNetB0 performed better than VGG16 and InceptionV3, which obtained 97.67% accuracy, where it acquired less accuracy on augmented images, 96.83%. Horizontal flip, 0.2 magnification, zoom, and 0.2° shear methods were used for image augmentation. Elfatimi et al. [13] used MobileNetV2 for classifying diseases of beans leaf that obtained 97.44% accuracy, where the performance of different optimizers, learning rate, and batch size were evaluated. Adam optimizer performed better than other optimizers such as stochastic gradient descent (SGD), Adam, RMSprop, and AdaGrad, and MobileNetV2 obtained 100.00% accuracy on the training set with Adam. For identifying Tuta absoluta of the tomato plant, Mkonyi et al. [14] proposed VGG16 which acquired 91.90% accuracy. On the other hand, ResNet50 and VGG19 obtained 86.80% and 83.10% accuracy, respectively. SGD with a batch size of eight was used during the training stage of CNNs. Saranya et al. [15] compared several methods for recognizing tomato plant diseases, where the performance of different deep learning models was addressed. Sujatha et al. [16] compared the performance of machine learning and deep learning approaches for detecting leaf diseases in plant. Sharma et al. [17] addressed a CNN architecture for diagnosing leaf diseases of rice and potato plants.
Kong et al. [2] addressed a multi-task learning method for diagnosing Crohn's disease, and their proposed architecture, namely, multi-task classification and segmentation network (MTCSN), obtained 89.23% accuracy using ResNet50, where MTCSN acquired 84.75% and 88.30% accuracy using ResNet101and DenseNet121, respectively. For image sentiment analysis, Moung et al. [18] addressed an ensemble-based method of facial expression recognition, where three deep learning models were combined with an averaging technique. The presented approach obtained 72.30% accuracy, where the single models including custom CNN, ResNet50, and InceptionV3 obtained 65.90%, 71.20%, and 63.90% accuracy, respectively.
In most of the above-discussed research works, individual CNNs were applied for classifying images, especially for diagnosing plant diseases. Moreover, the multi-plant disease recognition tool enables farmers as well as general people to diagnose different plant diseases with a single model or tool, which reduces the cumbersomeness of using multiple single models for multiple plants. This inspired us to study towards recognizing multi-plant disease with an efficient hybrid model, which enhances predictions and brings better outcomes than single models. Hence, in this article, a deep ensemble model is designed and evaluated that efficiently diagnosed diseases of four plants, which solved the difficulties and limits of existing single plant disease diagnosis tools.

RESEARCH MATERIALS AND METHOD
This article introduces a hybrid deep learning framework for diagnosing multi-plant diseases efficiently through leaf image recognition based on the stacking ensemble learning approach of two pre-trained CNNs, and the workflow of this technique is presented in Figure 1. Initially, 11,243 images were captured from ten fields of different places in Bangladesh for generating a robust dataset. After collecting leaf images, all images were reshaped according to the input size of single CNNs and labeled with class names. Training and validation sets were used for training and fitting pre-trained CNNs via TLT, and the test set of the MPLI dataset was used for test prediction by trained single models. Afterward, based on the test prediction results of single CNNs, two trained models were chosen for generating a hybrid model using the stacking ensemble learning approach.

MPLI dataset
Nine several kinds of leaf images were collected using a smartphone camera with a resolution of 3120×4160 pixels from cultivation lands, and then all images were reshaped to adapt to the structure of different CNNs, as presented in Figure 2. Nine classes of MPLI dataset are black gram healthy (BGH), black gram yellow mosaic (BGYM), betel leaf healthy (BLH), betel leaf rot (BLR), betel foot rot (BFR), Malabar spinach healthy (MSH), Malabar leaf spot (MLS), litchi healthy (LH) and litchi mite (LM) [19]- [22]. A large number of images is required for the training of CNNs to evade the overfitting issue, which is crucial for enhancing the efficiency of image classification through CNNs. To overcome this issue, IAT is now widely used, which provides a range of methods for increasing the quantity and size of the dataset. After an in-depth review, rotations with 90°, 180°, and 270° were selected, and these geometric IATs are more appropriate for classification-based tasks. Sample images of 90°, 180°, and 270° rotation were illustrated in Figure 2, which were generated from the sample image of the LM class. Rotation IATs were applied to images by rotating images to the clockwise and anti-clockwise axis [23].

SELT hybrid model
This manuscript introduces a hybrid deep learning technique to diagnose diseases of black gram, betel, Malabar spinach, and litchi plants through images, which is based on ensemble learning. Five pre-trained CNNs including Xception, VGG19, InceptionResNetV2, MobileNetV2, and ResNet50 were utilized in this study for generating a hybrid model using TLT for extracting features from images and classifying their labels. In TLT, CNNs achieve higher efficiency using small datasets by utilizing acquired knowledge from renowned and large datasets. Xception architecture is built with depthwise separable convolution (DSC) and the input image size is 299×299 pixels [24]. VGG19 is extensively applied in computer vision, which contains 19 weight layers and takes 224×224 pixels images in the input layer, and provides more than one outputs probabilities [25]. For detecting and classifying objects, MobileNetV2 was trained on the ImageNet dataset, which achieves higher accuracy using small datasets with less training time and takes 224×224 pixels images in the input layer [26]. Like Xception, InceptionResNetV2 architecture also takes 299×299 pixels in the input layer, which contains 164 weight layers and residual connections enhance the network efficiency significantly [27]. ResNet50 architecture is built with residual units, which contain 50 weight layers and the input size is 224×224 pixels 5113 [28]. The last portions of these pre-trained CNNs were customized for classifying nine labels of the MPLI dataset. The designed technique and workflow for diagnosing multi-plant diseases are illustrated in Figure 3. Initially, five CNNs were trained and evaluated using MPLI dataset in this study for checking their recognition efficiency. After the performance evaluation of single CNNs, two CNNs were selected based on their classification efficiency as base learners (level 0). Two base learners were combined (level 1) for generating a hybrid model using the stacking ensemble learning approach, where a meta learner gathered the learned knowledge of two base learners. Meta learner delivers a smoother interpretation of predictions provided by the base learners. Logistic regression was applied as a meta learner for predicting class labels of the MPLI dataset. Finally, the efficiency of the SELT model is evaluated through test prediction.

Figure 3. Illustration presenting TLT and the applied ensemble learning method
All experimental works were performed using Google Collaboratory GPU support, which uses Python 3 Google compute engine in the backend and the 2.8.2 version of TensorFlow was used. These pretrained CNNs were trained with large datasets, which contain a huge number of classes of different objects. The last fully connected layer of these CNNs was changed and the number of neurons was changed to nine, as the MPLI dataset contains nine classes. All single CNNs were trained for 40 epochs. The classification ability of single CNNs and SELT model on the test set of the MPLI dataset was evaluated by applying four analytics metrics, such as sensitivity (Sen), specificity (Spe), accuracy (Acc), and precision (Pre) which were calculated from the value of true positive (TP), true negative (TN), false positive (FP) and false negative (FN) [17], [18]. In (1) to (4), the arithmetical formulas of used analytics metrics were presented.
For a class mp,

RESULTS AND DISCUSSION
To evaluate the feasibility of the designed technique for four different plant disease recognition, several experiments were conducted using 2,250 testing images. These images of the test set were not seen before by the addressed technique. Diagnosing several plant diseases using a single model is a challenging task. Ensemble learning showed a remarkable performance in this study and efficiently diagnosed diseases of different plants with high accuracy. Firstly, the recognition efficiency of five pre-trained CNNs such as Xception, VGG19, InceptionResNetV2, MobileNetV2, and ResNet50 was examined, and acquired 97.83%, 81.38%, 84.26%, 93.15%, 96.85% accuracy, respectively. Afterward, Xception and ResNet50 were chosen for generating a stacking ensemble model. The SELT model obtained 99.20% accuracy, which was higher than individual CNNs. Moreover, SELT wrongly predicted 18 images, where Xception and ResNet50 misclassified 49 and 71 images, respectively, which strongly validates the efficiency of the SELT hybrid model. The The designed SELT model utilized predictions of two pre-trained CNN models, which remarkably increases its image classification capability. Class-wise diagnosis ability of Xception and SELT models are given in Tables 1 and 2, respectively, which were calculated using (1) to (4). In the BLR class, Xception obtained the highest sensitivity value, which was 99.43%. Xceptor acquired less sensitivity value in the LM class. The specificity value of the BGH and MSH classes was higher than other classes, 99.85%. On the other hand, the specificity value of the LM class was less than others, which as 99.62%. In BLH and MSH classes, Xception obtained the highest accuracy, which was 99.64%. The accuracy of all classes obtained by Xception was very close to each other. In the MSH class, Xception acquired the highest precision value, 98.70%. The precision value of the LM class was less than others. In the BGYM and LH classes, the SELT model obtained the highest sensitivity value, which was 100.00%. The sensitivity value of the BGH class was less than other classes. The SELT model obtained the highest specificity value in the BGH, BFR, and MLS classes, which was 99.95%. In the BGYM class, the SELT model obtained the highest accuracy, which was 99.91%. On the other hand, the SELT model obtained less accuracy in the BGH class, which was 99.73%. In the BFR class, the SELT model obtained the highest precision value, which was 99.55%.
In Table 3, class-wise false prediction numbers (FPN) of the Xception and SELT models are presented. In the LM class, Xception misclassified eight images, where SELT wrongly predicted two images. In the BGH, BFR, and MLS classes, SELT misclassified one image, which was the lowest FPN number in this   According to the outcomes of several experimental studies, the designed SELT model showed better diagnosis efficiency in classifying multi-plant diseases than single models. The 99.20% accuracy of the SELT model validates that the model obtained a high generalization capability when evaluated on 2,250 unseen leaf images of four different plants. Moreover, the higher accuracy of the SELT model validates its end-to-end recognition capability for different plant diseases. The designed technique was compared with existing techniques of diagnosing several plant diseases as presented in Table 4, where the introduced robust technique outperformed existing studies in terms of accuracy. accuracy. The SELT model showed superior accuracy as compared to single CNNs such as Xception, VGG19, InceptionResNetV2, MobileNetV2, and ResNet50 when evaluated on unseen testing images. Among single CNNs, Xception obtained better accuracy than others, which acquired 97.83% accuracy. In terms of class-wise false classification, the SELT model remarkably performed better than single CNNs, and it misclassified one image in the BGH, BFR, and MLS classes. Efficient architectures of Xception and ResNet50 models robustly extracted features from images which significantly increases recognition rates of the SELT model during performance evaluation. The significance and robustness of the SELT model in diagnosing multi-plant diseases are due to its ability to utilize the recognition efficiency of the two best-performing CNN models. With an improved vision-based diagnosis capability, the proposed SELT framework overcomes the limitations and difficulties of existing methods of deep learning-based plant disease recognition. The significance of the designed SELT system is crucial in the agriculture sector, due to the increasing demand for smart tools for enhancing the quality of foods and ensuring sustainability. In the future study, we plan to reveal a ground smartphone inspection tool for multi-plant disease detection and localization, where a more robust dataset of several plants will be generated. This will help people to get early warning of plant diseases with a more rapid and efficient diagnosis facility.