Accuracy study of image classification for reverse vending machine waste segregation using convolutional neural network

ABSTRACT


INTRODUCTION
In addressing main topic surrounding waste disposal, the suggested countermeasures to encourage greater public participation in the recycling efforts is the recommended introduction of the reverse vending machine (RVM).RVM is simplicity of usage, conveniently located with easy access by the public, and it provides a mechanism for rewarding recyclers for the selected item recycled.The proposed RVM allows the recycling of three beverage containers: polyethylene terephthalate (PET) bottles, aluminium cans, and drink carton boxes.AlexNet, GoogLeNet, DenseNet201, InceptionResNetV2, InceptionV3, MobileNetV2, XceptionNet, ShuffleNet, ResNet 18, ResNet 50, and ResNet 101 are the neural networks that will be used in this project.The performances evaluated among the convolutional neural networks (CNNs) are F1-score and time taken to perform the classification.
Classification technology is the most important technology in the RVM.In previous work, some RVM used hardware classification methods, such as barcode detectors and sensors.The problem with barcode detector is when it is damage and waste is unidentified thus failed to classify the waste.Whereas sensors can only detect certain objects and are limited to the type of sensors used.Sensors and barcode detectors will make the RVM expensive [1].This paper proposed an image classification method that is a more effective way to perform classification.The image classification method used in this project is transfer ISSN: 2088-8708  Accuracy study of image classification for reverse vending machine waste segregation … (Tan Hor Yan) 367 learning with the CNN.CNN-based approached have demonstrated remarkable classification performance in waste categorization tasks [2].
The image classification technique is proficient in handling identification and recognition of particular target objects (such as facial features, handwritten characters, or products), categorizing and labeling images, and evaluating the subjective quality of images [3].The architecture of a CNN bears resemblance to the interconnected structure of neurons in the human brain, and this association draws inspiration from the functioning of the visual cortex [4].CNN is analyzed into two groups which are essential layers and secondary layers.The essential layers subsist of convolution, active, pooling, flatten and dense layers [5].The secondary layers are additional components that can be incorporated into CNNs to increase their resilience against overfitting and improve their ability to generalize.Secondary layers include dropout layers, batch normalization layers, and regularization layers [5]. Figure 1 shows the network architecture of CNN [4].
Figure 1.CNN network architecture [4] Transfer learning is a methodology that focuses on improving the performance of target models on specific domains by leveraging the knowledge acquired from different yet related source domains [6].Transfer learning offers the advantage of saving resources and time.The key aspect of transfer learning is generalization, which entails transferring only the knowledge that can be effectively applied by new models across diverse scenarios or conditions [7].Instead of models being strictly tied to a specific training dataset, transfer learning allows for more generalized models.These models can be applied in diverse conditions and with different datasets.Various hardware techniques enable recognition and sorting in RVM machines, rendering fraudulent attempts virtually impossible.In contrast, the traditional waste classification scheme suffers from low efficiency and accuracy [8].Therefore, a new waste management system is necessary to handle waste materials effectively.Image classification, a computer vision approach, plays a significant role in organizing visual content based on images.Its primary objective is to assign pixels in a digital image to specific classes [9].Several image classification models have been developed to improve recognition efficiency [10].One proposed approach focuses on classifying plastic waste based on its resin identification code, facilitating efficient recycling [11].The classification technique demonstrates high efficiency in discriminating plastic waste, as shown by the WaDaBa database, which contains images of plastic waste.
Furthermore, CNN-based waste-type classifiers have been introduced to classify waste types within municipal solid waste figures, encompassing 9,200 trash figures, utilizing image classification techniques [12].These classifiers can directly identify various waste types or derive them from waste-item classes, with the highest waste-type classification efficiency reaching 94.86% using the ResNet50 classifier [12].At present, the combination of transfer learning and CNN has been extensively applied in image recognition.Transfer learning improved accuracy and enhance the robustness of the model [13].Transfer learning, a technique discusses in reference [6], aims to boost the performance of target models in specific domains by harnessing knowledge from different but related source domains.It has been effectively applied in a range of machine learning tasks, including text sentiment segregation [14], image segregation [15], human activity segregation [16], software defect segregation [17], and multi-language text segregation [18].Notably, CNN has gained significant popularity in segregation jobs, whether employed as a standalone model or integrated into conventional classifier ensemble CNN architecture [19].In study [20], a combination of GoogLeNet with SVM displays an encouraging result of 97.86 % based on the TrashNet dataset.A MobileNet variant [21] shows a performance of 96.57%.Additionally, the optimized DenseNet121 network achieved an impressive 99.6 % accuracy while applying the similar dataset [22].The outline of this paper is as follows.The related study of image classification, transfer learning, and CNN are presented in section 1. Next, the process details for this project are shown in section 2, followed by the results and discussion of the simulations in section 3. Lastly, this paper is concluded in the final section 4.

METHOD
This project will be divided into two major steps: preparing the dataset, and image classification.This part discussed the details of the proposed image classification technique using transfer learning with CNN networks.Figure 2 shows the overall flowchart for this project.The overall concept for this project is when the camera captures the beverage containers' image.The captured image will move to the system to recognize the image information.Then, the system will be based on the image information and therefore segregate the object based on the material.The step to perform image processing and classification are the same for all neural networks.The subsequent section will provide an in-depth exploration of these steps, elucidating their intricacies and implications.

Datasets preparation
Before implementing the image classification technique, the critical initial step involves constructing the dataset for the project.The dataset will contain information on three types of beverage containers: drink carton boxes, PET bottles, and aluminium cans.Data will include details on types, sizes, colors, shapes, and images captured from various angles.The collected data will be organized and stored in separate files based on their corresponding materials.All neural networks used in the project will utilize the same dataset to ensure consistency across models.The training dataset is used to fit the model's parameters, while the validation dataset provides an unbiased evaluation and helps tune hyperparameters [23].Partitioning the datasets prevents overfitting, where the model memorizes patterns in the training data and struggles with unseen data [24].

Image classification algorithms
Transfer learning involves taking pertinent segments of a pre-trained machine learning model and applying them to a fresh, albeit similar, problem.Typically, this entails retaining the model's essential functionality while integrating new elements tailored to address a specific task.For image classification, the process begins by incorporating image datasets and scaling the input data to match the pre-trained model's specified input scale.Then, a CNN model is constructed based on the scaled input data.Parameter transfer is employed to initialize the model's parameters using the pre-trained CNN image classification model.The final step involves testing the CNN model's performance.Transfer learning in CNN involves retaining the weights and biases of the original model, especially when retraining from the altered version.The primary objective of transfer learning is to leverage extensive datasets for pre-training the model.Subsequently, these trained parameters are fine-tuned using a smaller learning sample.This results in the final trained model benefiting from both swiftness and substantial robustness [9].Transfer learning is often explained in the context of domains and tasks.A domain, referred to as G, composed a feature space denoted as X, and a marginal probability distribution, Z(X), where  = { 1 , . . .,   } ∈ .Within a specific domain,  = {, ()}, a task consists of two fundamental components: a label space Y, and an objective predictive function :  → .This function f is utilized to make predictions, f(x) of a new instance x.The task is represented as  = {, ()}, and it is learned from training data that includes pairs of samples {  ,   }, where   ∈  and   ∈ .Given a source domain   and learning task   , and a target domain   and learning task   , transfer learning aims to enhance the learning process of the target predictive function (⋅) in a target domain,   , by leveraging the knowledge acquired from a source domain,   , and it is corresponding learning task,   .It should be noted that the source domain,   , is distinct from the target domain,   , and the source learning task,   , is different from the target learning,   .
Convolutional layers function as feature extractors, responsible for learning feature representations from input images.These layers consist of a collection of filters, often referred to as kernels, applied to the input data before processing [25].Each kernel has its unique width, height, and weights, enabling it to extract specific features from the input data.The convolutional layers are given as in (1).
where the input data is denoted as z; the convolutional filter associated with the k th feature map is symbolized as   ; the multiplication operation in this context represents the 2D convolutional operator, which computes the inner product of the filter pattern at each position within the input image; and (•) represents the nonlinear activation function.
Pooling layers have the role of lowering the spatial resolution of feature maps, which helps achieve spatial invariance to input distortions and translations.This not only expands the receptive field of convolutional kernels across layers but also lowers computational complexity and memory demands by downsizing the feature maps [26].This downsampling operation lowers the resolution of the feature maps while preserving essential features needed for the processing of subsequent layers.Pooling layer is given as in (2).
where the result of the pooling operation for the k th feature map is denoted by    ,   denotes the element locates at (p,q) within the pooling region ℜ, which excompasses a receptive field centered at the position (i, j).To capture more abstract feature representations, network architectures often incorporate multiple layers of convolution and pooling.After these layers, fully connected layers are employed to interpret these feature representations and enable high-level reasoning.

Model evaluation
The validation dataset is frequently employed alongside a confusion matrix to appraise the effectiveness of a neural network model.It aids in the assessment of metrics like accuracy, precision, recall, and other evaluation criteria.True positive, TP indicates correct positive class predictions, while true negative, TN signifies accurate negative class predictions.False positive, FP refers to incorrect positive class predictions, and false negative, FN represents inaccurate negative class predictions.Figure 3 shows the confusion matrix.
Accuracy and F1-score are commonly used to measure the model's performance, but both will be based on what types of datasets will be used.F1-score is a better evaluation metric when the dataset is imbalanced compared with accuracy [27].The F1-score takes precision and recall into account and is less affected by class imbalance than accuracy [28].The accuracy, precision, recall and F1-score are representing in ( 3)-( 6

RESULTS AND DISCUSSION
This section will provide a comprehensive demonstration of the results obtained from the dataset preparation process as well as the image classification algorithms utilized, employing transfer learning with CNNs.The outcome will showcase the effectiveness of the dataset preparation methods in collecting and organizing the necessary data for training and validation.Additionally, the performance of the image classification algorithms, which leverage transfer learning and CNNs, will be evaluated and discussed in detail.

Dataset preparation
Datasets for each category comprise images of various sizes, colors, and shapes, captured from multiple angles, resulting in a total of 500 image datasets.These datasets are then divided into separate sets for validation, training, and testing to prevent overfitting [23].The datasets will separate into training and validation datasets with the ratio of 70:30.A training partition size ranging from 40% to 80% will provide an optimal balance between accuracy and precision [29].Figure 4 shows that example of image datasets.

Image classification algorithms
The CNN networks are trained and evaluated using MATLAB R2020b.To adapt these CNN networks for new image classification tasks, the final 2-D convolutional layer and the last classification layer of the network are substituted.Table 1 shows the feature vector and feature layer for AlexNet, DenseNet201, GoogLeNet, InceptionResnetV2, InceptionV3, MobileNetV2, XceptionNet, ShuffleNet, ResNet 18, ResNet 50, and ResNet 101.

Model evaluation
In the confusion matrix, the blue color represents the classification effectiveness.The horizontal axis corresponds to the predicted values whereas the vertical axis represents the actual values.Figure 5 shows the confusion matrix for the AlexNet network.

Model evaluation comparison
This project is to develop an image classification method for RVM.The dataset for this project belongs to imbalanced datasets, so the metric will be focused on F1-score rather than accuracy.Thus, all the neural networks will be compared among their F1-score and the computational time to perform the training process.Table 3 shows the model performance among the eleven neural networks.
Based on the

CONCLUSION
A RVM with an image classification technique is proposed to automatically segregate three beverage containers: PET bottles, drink carton boxes, and aluminium cans.Transfer learning with CNN network will be used to perform classification in this research.The convolutional neural networks that used in this project are AlexNet, GoogLeNet, DenseNet201, InceptionResnetV2, InceptionV3, MobileNetV2, XceptionNet, ShuffleNet, ResNet 18, ResNet 50, and ResNet 101.This project needs to evaluate the F1-score of the different neural networks, and the computational time.The neural network with the highest F1-score and the shortest computational time to perform the training process will be used for the RVM.The dataset for this project contains 500 images of beverage containers and is separated into the training and validation sets.
AlexNet is the best CNN network in transfer learning.This is because in terms of computational time, AlexNet takes the shortest time, 2229.235 s, to complete the process.In terms of F1-score, AlexNet achieved 97.50% which is the fourth highest among the eleven networks.Although DenseNet201 achieved 100% F1-score but the computational time to complete the process is 12860.694s which takes longer than AlexNet.In short, AlexNet is the best CNN network in transfer learning and will be chosen for the RVM.

Figure 4 .
Figure 4. Example of image datasets

Table 2 ,
Table2summarize the performance of the AlexNet is shown in Table2.AlexNet achieved 98.67% accuracy and 1.33% error, with precision, recall, and F1-score all at 0.975%.The processing time for AlexNet was 2220.235s.The PET bottle datasets in the test set were correctly identified, but two datasets were inaccurately singled out.One dataset from the aluminium can was mistakenly classified under drink carton box, and one dataset from the drink carton box was inaccurately classified under aluminium can classes.

Table 2 .
Performance of the AlexNet network

Table 3 ,
AlexNet is the preferred CNN network for transfer learning due to its combination of computational efficiency and a reasonably high F1-score of 97.50%.With a processing time of 2229.235s, it outperforms other networks, including DenseNet201, which has a perfect F1-score but a much longer processing time of 12860.694s.Additionally, Resnet 18 achieves a higher F1-score than Comp Eng, Vol.14, No. 1, February 2024: 366-374 372 AlexNet but takes more time to complete the process.Overall, AlexNet is the best choice for the RVM as it achieves a good F1-score and completes the process in the shortest time among the eleven networks considered.