Corn plant disease classification based on leaf using residual networks-9 architecture

ABSTRACT


INTRODUCTION
Corn is a carbohydrate-producing plant that occupies the second position in Indonesia after rice [1].From the data obtained in 2021, the amount of corn production in Indonesia reaches around 15.79 million tons, proving that corn production in Indonesia is very large [2].Every plant will never be free from diseases that make plants unable to carry out their normal physiological growth properly [3], [4].In general, disease in plants determines the viability of a plant to be produced.In corn plants, there are 3 types of common diseases, there are common rust, gray leaf spot, and northern leaf blight [5]- [9].Common rust is a disease of corn plants with symptoms experienced by small round to oval spots on the leaf surface on the top and bottom [10].Gray leaf spot is a disease with symptoms experienced in the form of greyish-brown spots on the entire leaf surface [11].Northern leaf blight has symptoms in the form of small, oval-shaped spots, while the more elongated spots are grey or brown in color [12].
The application of classification in various areas of human life can help humans solve a problem [13], [14].The use of artificial intelligence that works intelligently in computer vision provides useful information for humans through visual images such as animals, plants, and others which aims to provide basic information about the existence of an object and the classification of an object [15], [16].In this case, the object to be detected in the research conducted is to use a case study of corn plants and the focus of the classification object is to identify leaf diseases found in corn plants.Diseases  2909 monitor manually because of the limited number and working time of farm workers.So that not a few mistakes are made in handling disease problems in corn plants.
In recent years, machine learning and deep learning research in the detection of image classification of plant diseases have become the center of research [17].Several machine learning and deep learning research methods that have been successfully applied in corn plant disease classification mainly include support vector machine (SVM) [18], k-nearest neighbors (KNN) [19], and convolutional neural networks (CNN) [5].Aravind et al. [18] used as many as 2,000 image data obtained from the PlantVillage open access image database and implemented an SVM to classify types of corn plant diseases, namely common rust, Cercospora leaf spot, leaf blight, and healthy, and achieve a best average accuracy is 83.7%.Athani et al. [19] used 500 image corn plant crops and implemented an SVM and achieve an accuracy of 82% which was determined by using k-fold cross-validation.Khotimah [20] used 200 data for the identification of maize plant nutrients by doing a comparative analysis of KNN and Naïve Bayes method and their results show that KNN classification performance is more accurate than Naïve Bayes in the average accuracy of KNN is 92.40% with the best k difference at  = 7. Hidayat et al. [5] used 3,854 image data to classify corn plant diseases are categorized into common rust, gray leaf spot, and northern leaf blight diseases and the accuracy result obtained is 99%.From the accuracy results obtained, it can be seen that the application of the CNN algorithm in data classification is high [5].
CNN is the most popular structure for classification image detection [21]- [24].In our study, we use CNN because we want to get the best accuracy and CNN has been proven to be very effective in image disease plant classification research [25] and also CNN extracts features automatically [26], [27].Here, we propose to use ResNet-9 architecture for building model CNN to classify disease corn plants.residual networks (ResNet) are a convolutional network that is trained on more than 1 million images from the ImageNet database [28]- [30] and ResNet-9 has a pretrained network that can classify images of up to 1,000 object categories which makes the network used to learn good feature representation for various images.The main contribution in this study can be highlighted as: i) designing of new CNN model for diseases classification on corn plants; ii) doing comparisons of epochs with the distribution of 80% training data and 20% testing data have been carried out to obtain the best model; iii) using ResNet-9 architecture and Adam optimizer to train the best model; and iv) building a web interface that can classify diseases in corn plants which will display the label and the accuracy of classification results displayed.

RESEARCH METHODS
The explanation of the data processing design scheme is as follows as shown in Figure 1.a) Image data are used as input to be processed.b) After the image data is input, image preprocessing is carried out first, where image resizing will be carried out which serves to select the image size according to the image to be scaled so that image distortion does not occur.) Next, the classification stage will be carried out using the CNN algorithm, which consists of several layers, namely the convolution layer, rectification linear unit (ReLU), batch normalization, max pooling, flattening, and fully connected layer.d) In the convolutional layer, the first thing to do is determine the pixel value that will be used to perform the convolution.e) In ReLU, there will be units that increase the representation of a model and introduce nonlinearity, which in this layer will be followed by a convolutional layer.f) Batch normalization is used to normalize the activation before passing it on to the next layer in the network.g) In max pooling will choose the maximum value of all pixels where max pooling is located on the pooling layer which is the layer used for input and processing of various statistical activities based on the nearest pixel value.h) In the flattening stage, the results of the pooled feature map are converted in the form of a 2-dimensional array into one long linear vector i) In the fully connected layer stage, the layer taken from the input of the previous process will determine which features are most closely related to a particular class.j) From the classification stages that have been carried out, classification data will be produced as output consisting of 4 classifications.

Data
The data to be used is taken from the dataset source that provides the image dataset.The source of the dataset in this study will use a dataset taken from the Kaggle platform.Data retrieval from the Kaggle dataset was carried out because the selected dataset presented data that was in accordance with the needs of this research.In collecting datasets, there are types of image criteria for training and testing in the CNN method [31], [32].Data collection for training data was taken as many as 1,916 for common rust, 1,651 for Cercospora leaf spot gray leaf spot, 1907 for northern leaf blight, and for healthy there were 1858 images.As for testing data, 478 were taken for common rust, 411 for Cercospora leaf spot gray leaf spot, 478 for northern leaf blight, and for healthy there were 466 images, where the total data is 9165 data which is divided into two, namely training data (80%) as much as 7,332 data and testing data (20%) as much as 1,833 data.The characteristics of the training data consist of 4 parts, namely common rust, gray leaf spot, northern leaf blight, and healthy.In collecting the dataset, images will be collected from each leaf object.After the image data is collected, it will be grouped into folders to group leaf objects [33].

Image preprocessing
The data used in this study is that the images taken are clearly visible, based on the shooting requirements described in the dataset analysis subsection.Preprocessing has the goal that the dataset to be used is processed to produce the data needed by the system by removing unnecessary information from image data [34].The preprocessing process has stages, namely image cutting, and image resizing.
The dataset of corn leaf images in this study obtained from the Kaggle platform consists of colored or red green blue (RGB) leaf images.The image will show which parts need to be cropped to remove unwanted information such as the background of the image or the presence of other images around the leaf image.The detected corn leaf image is cropped in such a way that the remaining image is only the leaf area that contains the parts required for leaf grouping.Cropping the image will help remove unnecessary parts of the image such as the background of the image, and other images that are around the image, but it is enough to only leave the part of the leaf image that is needed.The part of the leaf that will be detected can be in the form of an image of a whole leaf that has blades and leaf midribs, both front and back, which only focuses on the leaf image.
After obtaining the required portion of the given corn leaf image, it can be ensured that the cropped leaf image should not be too small or too large in order to be processed because the image must still be clearly visible.Therefore, it is necessary to select an appropriate image size for the image to be scaled to avoid image distortion.For the image used in this study, the leaf size will be cut to 256×256 pixels which only contains the leaf part of the original image without interference from other images around it.

Classification system block
The classification system block as shown in Figure 2 is the workflow of the classification system where the first process is the design CNN architecture model or designing the architecture that will be used.Then the second process is to do image preprocessing or a process that will change the image size according to the CNN architectural configuration that will be used and that has been designed.The third process is the CNN train architecture model or conducting training and validation on pre-processed datasets.Furthermore,

2911
CNN architectural model testing is a process that will test the CNN architectural model that has been trained and tested.The test results will be stored and will be used for evaluation and conclusions at a later stage.[35].ResNet 9 architecture, which has 9 layers, can be seen in Figure 3. ResNet-9 has a pretrained network that can classify images of up to 1,000 object categories which makes the network used to learn good feature representation for various images.ResNet 9 can be used as an image classification.ResNet 9 architecture has the concept of short connections, which will forward the input of each layer to the next layer.This concept will reduce the occurrence of errors or loss of features in the convolutional stage.ResNet 9 has 5 stages in the residual layer and is processed convolutionally and will be forwarded to the max pooling and fully connected layer.

Adam optimizer
Adaptive moment estimation (Adam) is an algorithm for gradient descent optimization techniques.This method is very efficient when working with large problems involving a lot of data or parameters for which the algorithm is concerned.Adam has advantages over other optimizers, namely adaptive gradient, and root mean square propagation (RMSProp) [37], [38].In determining the weight value if RMSProp uses the first value in the gradient, on the other hand, Adam uses the second value from the gradient.It is specifically able to calculate the calculated exponential mean of the gradient, the gradient squared, and the beta1/beta2 parameter controls the decaying average.The result of the comparison of Adam optimizer algorithm with other optimization algorithms shows that Adam optimization works well in practice [38].The first way the Adam optimizer works is to measure the steps of the optimization to be made and determine the exponential decay rate for the forecast to be performed.

Designing a system overview
In this section, we will explain the general description of the system to be built, namely a system for classifying diseases in corn using CNN.This design is made as a reference for building a system on the hardware and software parts such as an overview of the system to be built, the design of the system, and the 2912 output of the system.This design is expected to facilitate the implementation of the system.In the general description of the system, it can be seen that the components must be connected to each other in order to communicate to run the system.
Based on Figure 4, it can be seen that the system workflow on the web interface to be built is that users with various devices or software accessing the web interface to be built will make requests.The request that will be made is to FastAPI which is the API endpoint used in web interface development.In this FastAPI, requests from users, namely data in the form of images, will be redirected or will call the CNN model, namely the CNN model with ResNet-9 that has been formed previously to make predictions on the data that has been received.The result of the prediction will be sent back to the user via hypertext transfer protocol (HTTP) response, where the user will receive the predicted result in the form of a predicted or a label from the image and also confidence which is the accuracy value of the image prediction which will then be displayed on the web interface that is being run by the user.

Result of hyperparameter
In deep learning, there are several hyperparameters that become variables that will affect model output, determine training details and significantly influence model performance [39]- [41].In this study, three hyperparameter tuning experiments were carried out and can be shown in Table 1.Based on the table, batch_size is the number of sample data to be sent to the neural network.In this study, there are 9,165 datasets and the batch_size used is 32 sizes, which means using the first 32 data samples, it will be sent to the neural network, then the next 32 data samples until all data is distributed to the central network.The effect of the batch_size when running the training model is that the smaller the batch_size used, the smaller the memory used so it does not take a long time to run the training.num_workers is the number of how many sub-processes used to load data.If the default  = 0 means the data will be loaded in the main process, in other words, the main process will do the data loading when needed.  1, it can be concluded that if the num_workers used is larger and the batch_size per iteration is smaller it will affect the runtime speed where the required runtime will be less or faster in running the model.So, this study applies the results of the 3 hyperparameter tuning experiments where num_workers is 4 and batch_size is 32.This hyperparameter tuning experiment is carried out in the data preparation process by changing the size of batch_size, evaluating the model, and also training the model by changing the value set for the number of num_workers.

Result of building model
ResNet-9 model building process which shows that at each convolution layer a batch normalization layer is added to normalize the output of the previous layer, then followed by an activation layer to neutralize values less than 0 and max pooling will be carried out to obtain the results maximum of all image pixels.Figure 2 shows the model-building process and summary of the output of the ResNet-9 model which uses the torch summary library.Figure 5 describes the implementation of the CNN architecture for the formation of the model used for training.Basically, first, we resize each image to 256×256.After that, the image is inserted into the CNN.The first convolution layer applies 32 filter sizes or channels output.That means 32 different filters are applied to the image and try to find the features and after that using 32 features the author creates a feature map that has 32 channels.So, from 3×256×256 to 32×254×254.After that, the ReLU activation function will be applied to eliminate non-linearity.Batch normalization to normalize neuron weights.After that, we put this image into the maximum batch layer taking only the most relevant features, so we get an output image in the form of 32×128×128.After that, we put this image into the next convolution layer and the process is the same as mentioned above.Then we flatten the output of the final maximum pooling layer and insert it into the next linear layer, which is also called the fully connected layer, and finally, as the last layer, and predict 3 categories.So as the output model, we get a tensor size of 1×3.From the tensor, the index is taken from the maximum value of the tensor.That particular index is our main prediction.The visual display of the image of the corn plant while going through the training and classification process on the CNN method as shown in Figure 6.
From Table 2, it can be seen that this study uses 9 layers in the construction of the model because the model used is ResNet-9 architecture, where each layer has a combination of convolution, normalization, ReLU, and max pooling, which is used in forming the model, where the convolution modeling is done eight times and then proceed to flatten.The summary will be based on the input shape used in this study, namely 3×256×256 which explains that the image used has 256×256 pixels with 3 RGB channels.For the explanation of the first output shape, the applied filter is 3×3 (kernel size) and will move with a stride of 1×1, for a pixel size of 256×256 this will produce 64 filters.The first convolution on the first layer will bring 3×3×3 (according to the output dimensions) as big as 27 params + 1 bios in total is 28 filters for each move.The total parameters for the first convolution generated are 28×64 filters of 1,792.Then normalization will be carried out using Batchnorm2D so that the total params carried are 2×64 (filter size) of 128.For activation, it still maintains its previous form, so it requires 0 params for the process of maintaining it.

Result of training model
After implementing the model formation, then the model that has been formed will be trained using Adam optimizer.Table 3 is the result of the training model with 100 epoch experiments.Training loss is the value of calculating the loss function of the training dataset and predictions from the model.Validation loss is the calculation value of the loss function from the validation dataset and predictions from the model with input data from the validation dataset.Validation accuracy is the value of calculating the accuracy of the validation dataset and predictions from the model with input data from the validation dataset [42].From the table, it can be concluded that the higher the number of epochs, the higher the value of validation accuracy and the lower the value of validation loss.The final validation accuracy value obtained after running 100 epochs is 0.9908 and the running time required to run 100 epochs is 1 hour 41 minutes 2 seconds.Table 4 shows experimental data from the training dataset with several experiments for the number of epochs, this is done in order to find a high level of accuracy for the model formed.The conclusion that can be drawn is that more ages will be used for a longer time but will increase the accuracy of the model formed.From Figure 7(a), it can be seen that the graph displays the loss value obtained from each epoch with a total of 100 epochs, where the loss value obtained in the training data is marked with a blue line graph and for validation data, it is marked in red.In the training data, the loss value in the first epoch is 0.0213, in the second epoch it is 0.0300, in the third epoch it is 0.0186, in the fourth epoch 0.0325, in the fifth epoch it is 0.0203 until the 100 th epoch is 0.0150.In validation data, the loss value obtained in the first epoch is 0.0296, at the second epoch is 4.643, at the third epoch is 0.0296, at the fourth epoch is 0.0529, in the fifth epoch is 0.0262 until the 100 th epoch the loss value is 0.0373, where the more the number of epochs, the lower the loss value obtained and if the graph displayed is optimal, it can be dismissed.Based on the picture, it can also be seen that the optimal graph is at 100 epochs, so for a total of 100 epochs, the loss value is considered optimal.From Figure 7(b), it can be seen that the graph displays the accuracy value obtained from each epoch with a total of 100 epochs, where the accuracy obtained in the first epoch is 0.635, the second epoch is 0.837, the third epoch is 0.954, the fourth epoch is 0.973, the fifth epoch is 0.985 and so on until the 100 th epoch value is 0.9908, where based on the epoch experiment that has been done, it can be concluded that the more the number of epochs, the higher the accuracy value obtained.

Result of web interface
In this section, we will explain n the results of the web interface that has been successfully built to make it easier for users to use the CNN model in classifying diseases in corn plants.Figure 8(a) is the web interface of the input image that has been built which will display an image drop box that can be used to upload images.After uploading the image, then the image predicting process that has been built will display the predicting process with the display of images that have been successfully uploaded or inputted previously and also the "Processing" status. Figure 8(b) is a web interface of the output or result predicting image that has been built which will display the accuracy results and also the label of the predicted image that has been uploaded or inputted previously.

Result of testing
This stage will explain the test results from the data that has been tested using CNN.Testing in this study was carried out using a model that had been trained or trained previously.The test data from this research will be tested using the training model that has been built.The test result data which are gray leaf spot, common rust, northern leaf blight, and healthy consisting of image results, detection results, and accuracy test results can be seen in Table 5.

Result of evaluation
In this section, the performance metrics from the confusion matrix will be explained which are used to calculate various performance metrics in measuring the performance of the model that has been created.In working on the confusion matrix in this study, the amount of data used is 9165 data obtained from training data and valid data.Training data as 1,916 for common rust, 1,651 for Cercospora leaf spot gray ISSN: 2088-8708  Corn plant disease classification based on leaf using residual networks-9 architecture (Tegar Arifin Prasetyo) 2917 leaf spot, 1,907 for northern leaf blight, and for healthy there are 1,858 images.as for valid data, 478 were taken for common rust, 411 for Cercospora leaf spot gray leaf spot, 478 for northern leaf blight, and for healthy there were 466 images, where the total data is 9,165 data, and the data for each category is 2,394 for common rust, 2,062 for gray leaf spot, 2,385 for northern leaf blight, and 2,324 for healthy.The confusion matrix result can be shown in Table 6.Based on the data from Table 5 obtained values for true positive (TP), false negative (FN), false positive (FP), and also true negative (TN) from each category, namely gray leaf spot (GLS), healthy, northern leaf blight (NLB), and rust, where this value will be used to calculate the value of the performance metrics in this study, namely using the accuracy value.Accuracy is what describes how accurate the model is in classifying correctly or is the level of closeness of the predicted value to the actual value [26].The accuracy value obtained based on calculations using the accuracy formula is 0.99, which means that the accuracy value for correctly predicting data from the entire dataset is 99%.

CONCLUSION
The amount of data used in this study affects the quality of the resulting model.The dataset used in this study is 9,165 image data with 4 categories in building a CNN model, which in this study also compares 5, 25, 55, 75, and 100 epochs for the model to be used.The highest accuracy value in the experiment was obtained from experiments using 100 epochs so in this study 100 epochs were used in model formation.Based on the results of the hyperparameter tuning experiments carried out, it can be concluded that if the num_workers used is larger and the batch_size per iteration is smaller it will affect the runtime speed where the required runtime will be less or faster in running the model.Therefore, this study applies the results of hyperparameter tuning experiments where num_workers is 4 and batch_size is 32.The model produced in this study was able to detect the type of disease in corn plants through leaf imagery with good accuracy where the results obtained were 99%.In this study, a simple web interface has been developed for classifying corn leaf images where the user system will enter an image, then the system will predict the image and display the classification results from the image that has been inputted.
in maize are still difficult to Int J Elec & Comp Eng ISSN: 2088-8708  Corn plant disease classification based on leaf using residual networks-9 architecture (Tegar Arifin Prasetyo)

Figure 4 .
Figure 4. General description system

Figure 6 .
Figure 6.Feature extraction CNN feature map visualization

Figure 7 .
Figure 7. Epoch trial graph of 100 times for (a) loss value (b) accuracy value

Figure 8 .
Figure 8. Web interface corn plant classification (a) input image and (b) result image

Table 1 .
Hyperparameter tunning experiment  Corn plant disease classification based on leaf using residual networks-9 architecture (Tegar Arifin Prasetyo) 2913 Based on the results of hyperparameter tuning experiments carried out 3 times in Table

Table 4 .
Dataset training results

Table 6 .
Confusion matrix result