Rapid detection of diabetic retinopathy in retinal images: a new approach using transfer learning and synthetic minority over-sampling technique

ABSTRACT


INTRODUCTION
Diabetes is a health disorder defined by the body's incapacity to generate enough insulin, an essential hormone for the metabolism of carbohydrates.This insufficiency consequently results in elevated glucose levels in the blood.Typically, individuals with diabetes exhibit high blood sugar levels, which manifest in intensified thirst, increased hunger, and frequent urination [1].
Research indicates that diabetes tends to have a more severe and deteriorating impact on women compared to men, not only reducing their survival rate but also diminishing their quality of life [2].Diabetes can inflict chronic damage and dysfunction across various tissues, with the eyes, kidneys, heart, blood vessels, and nerves being particularly susceptible [3].Therefore, the effective management and treatment of this condition are of paramount importance in healthcare.
One of the most serious complications arising from diabetes is diabetic retinopathy (DR), a condition that has emerged as the leading cause of vision loss among adults of working age.DR stems from prolonged, uncontrolled diabetes and affects the eye, leading to a range of ocular abnormalities that could culminate in blindness.The global prevalence of diabetes has been on an upward trend, with predictions suggesting an increase from 2.8% in 2000 to 4.4% in 2030 across all age groups [4].Coupled with an aging global population and rapid urbanization in developing countries, this trend could result in the global diabetes population nearly doubling from 171 million in 2000 to 366 million by 2030 [4].
DR is generally categorized into five stages.These stages, starting from a normal eye to the most severe form of DR, are depicted in Figure 1.The stages include normal, mild non-proliferative DR (mild NPDR), moderate non-proliferative DR (moderate NPDR), severe non-proliferative DR (severe NPDR), and proliferative DR (PDR).Each stage exhibits unique ocular abnormalities, such as microvascular aneurysms, leaking blood vessels, retinal bulges, aberrant blood vessel growth, and neural tissue damage [5].Among these, the PDR stage is particularly concerning, as it represents an advanced state of the disease and is marked by the growth of abnormal new blood vessels, which can lead to severe vision loss if not treated promptly.Recognizing the increasing prevalence of DR and its potential impact on global health, there is a pressing need for more efficient and accurate screening methods.Previous attempts to address this issue have harnessed the power of image classification, pattern recognition, and machine learning, marking significant progress in the field [6], [7].However, the challenge remains to develop an approach that can deliver superior accuracy and efficiency in detecting DR.
Against this backdrop, this document presents a unique strategy for DR identification, harnessing the power of advanced machine learning techniques.By leveraging state-of-the-art algorithms and training methodologies, our approach is designed to process retinal images with enhanced accuracy and reduced computational time.This new strategy aims not only to deliver timely and accurate detection of DR but also to bridge the gap between technological advancements and clinical application.By doing so, it facilitates early intervention and plays a critical role in the prevention of vision loss, ensuring patients receive timely care and treatment.
The rest of the paper is organized as follows: Section 2 provides an overview of the existing literature in the field of DR detection.In section 3, we delve into the details of our proposed method, including a description of the dataset used, the key concepts underpinning our approach, and the architecture of our proposed system for DR classification.The results of our study are presented and discussed in section 4. Finally, section 5 concludes the paper and outlines potential avenues for future research.

LITERATURE REVIEW
The application of machine learning algorithms for predicting and detecting DR has attracted substantial attention from both researchers and medical professionals.Numerous prediction models have been developed, leveraging various machine learning approaches to predict DR.This section reviews some key studies that utilized machine learning methods for DR prediction, focusing primarily on works that used the diabetic retinopathy level 1 (DR1), methods to evaluate segmentation and indexing techniques in the Domain of Retinal Ophthalmology (MESSIDOR), and Kaggle diabetic retinopathy datasets.
Qomariah et al. [8] introduced an innovative technique that combines convolutional neural networks (CNNs) and support vector machines (SVM) for the detection of DR.They trialed their methodology on 77 and 70 eye images from the 12 th and 13 th bases of the Messidor database, respectively.A blend of resnet50, transfer learning, and SVM achieved the top accuracy of 95.83% for base 12, while the fusion of inception v3 and VGGNet type 19 reached the peak accuracy of 95.24% for base 13.The research concluded that the combination of features derived from CNN transfer learning and SVM can yield encouraging outcomes for DR categorization.Following a similar vein of exploring the potential of CNNs, Doshi et al. [5] spearheaded a study where they honed in on deep convolutional neural networks tailored for DR detection and classification from color fundus images.Diversifying their approach, they tested three different CNN models, culminating in an aggregated model that achieved a notable kappa-squared result of 0.3996.This experiment further emphasized the utility of deep learning techniques in refining DR detection accuracy.
Adarsh and Jeyakumari [9] developed an automated diagnostic system using image processing techniques to identify retinal blood vessels, pathologies like exudates and micro-aneurysms, and specific texture properties.These anatomical and textural features were then input into a multiclass support vector machine (SVM) for classification as normal, mild, moderate, severe, or proliferative.The system was tested on two publicly available datasets, diabetic retinopathy evaluation database 0 (DIARETDB0) and diabetic retinopathy evaluation database 1 (DIARETDB1) and achieved surprising results with resolutions of 0.96 and 0.946, respectively.
Instead of creating a deep neural network from scratch, Patel and Chaware [10] proposed a method based on MobileNetv2, which leverages transfer learning and fine-tuning techniques.The performance of their approach was evaluated using the Kaggle diabetic retinopathy dataset.Their experiment involved training the model on 2,929 retinal fundus images and validating it on 733 images.The model was then tested on 1,928 images.The results showed a significant improvement in network training accuracy, from 70% to 91%, after fine-tuning, and the validation accuracy increased from 50% to 81%.
Roychowdhury et al. [11] introduced diabetic retinopathy analysis using machine learning (DREAM), a three-tier computer-assisted screening procedure for DR.This innovative approach aims to accurately recognize and classify fundus images as either being affected by DR or not.Their study employed AdaBoost feature ranking to choose the top 30 characteristics from an overall count of 78 features.These chosen features led to an impressive area under the curve (AUC) exceeding 0.83 when sorting out bright and red lesions.This achievement was accomplished through the application of classifiers such as Gaussian mixture model (GMM), K-nearest neighbors (KNN), and a Bayesian combination of probabilistic classifiers (SVM+GMM, SVM+KNN).As a result, the DREAM system demonstrated a specificity of 53.16%, sensitivity of 100%, and an AUC of 0.904.
A multitude of other studies [12]- [21] have also employed machine learning techniques, deep learning, and transfer learning to predict DR, attesting to the extensive efforts in this field to develop more accurate and efficient prediction models.These methods have revolutionized the way researchers and clinicians approach DR detection, offering tools that can discern subtle patterns and variations often missed by traditional methods.As DR remains a leading cause of vision loss globally, the significance of these advancements cannot be overstated, emphasizing the potential of technology to reshape early detection and intervention.

MATERIALS AND METHOD 3.1. Dataset overview
The dataset employed for this research was initially assembled for a Kaggle competition held in 2015 [22].It is worth noting that this dataset is quite unique compared to standard Kaggle datasets.Each image in the dataset is from a different individual, captured using a variety of cameras, and presented in a range of sizes.The dataset comprises a total of 35,126 images, providing a rich resource for analysis.
The dataset is designed for prediction across five distinct categories, each representing a different stage of DR as shown in Table 1. Figure 2 offers a graphical representation of the distribution of each category within the dataset.A notable characteristic of this dataset is its imbalance issue-class 1 alone accounts for approximately 73% of the total data.To address this imbalance and ensure a more robust model, we will apply SMOTE during the preprocessing stage.A comprehensive understanding of the dataset is crucial for the development of an effective model.A dataset's nuances, outliers, or imbalances can significantly influence the accuracy and performance of a predictive model.By deeply understanding our dataset, which comprises thousands of labeled retinal images with varying degrees of DR severity, we can better tailor our preprocessing and modeling techniques to capture these nuances.The following sections will elaborate on the preprocessing steps undertaken to refine this data and the proposed method for classifying DR based on this dataset.

Proposed method
This research presents a novel approach towards the development of an automated detection system specifically designed for the classification of DR in retinal images.Given the rapidly increasing number of individuals afflicted by DR worldwide, it is of paramount importance to classify patients into the various stages of DR at the earliest opportunity.Through the application of transfer learning, fine-tuning, and SMOTE, our research aims to enhance classification accuracy in the analysis of a multitude of DR images.
The conceptual framework of our proposed approach, which is visually represented in Figure 3, comprises three fundamental steps: preprocessing, transfer learning, and classification.The preprocessing stage is paramount as it sets the foundation for the entire model.Here, we employ the SMOTE technique to address data imbalance, a prevalent challenge in medical datasets, ensuring that the model is trained on a balanced dataset and, hence, enhancing the performance of the subsequent steps.In the transfer learning stage, we incorporate the VGG-16 model as our base model.Transfer learning leverages the knowledge gained from training large-scale datasets, which can then be applied to smaller, specific tasks, thus improving the model's performance and reducing training time.VGG-16, known for its depth and accuracy, is particularly suited to this task.
Following transfer learning, the final step involves classifying the images into their respective DR stages.This is where the fine-tuning process plays a vital role.Fine-tuning typically involves adjusting specific layers of the pre-trained model, allowing it to learn features that are more tailored to the nuances and characteristics of our dataset.By doing this, we ensure that the model is not just relying on generalized features from the pre-trained data but is also incorporating specialized knowledge from our dataset.As a result, the fine-tuned model becomes more adept at distinguishing between the subtle variations in the different DR stages, leading to a more accurate and robust classification.
The sections below delve into a comprehensive explanation of each step involved in the construction of our proposed approach.The world of medical image analysis has seen a paradigm shift with the integration of deep learning models, and among them, the VGG-16 stands out due to its remarkable performance in image classification tasks.This is why our approach leans on the intricacies of transfer learning using the VGG-16 model.Alongside this, fine-tuning ensures that the model is tailored to the specific nuances of our dataset, enhancing its ability to discern between various stages of DR.Additionally, the prevalence of data imbalance in medical datasets makes techniques like SMOTE indispensable.This technique ensures that our model is not biased towards any particular DR stage, facilitating a more balanced and accurate classification.These methodologies collectively aim to enhance the model's accuracy and robustness.

SMOTE and preprocessing
Addressing the issue of imbalanced datasets is one of the most formidable challenges faced in machine learning classification problems.In such datasets, one or several classes may have a significantly smaller number of instances compared to the others, which results in a detrimental skew in the data.A highly effective solution to this problem is SMOTE [23].This technique essentially augments the minority class by generating synthetic data points along the line segments connecting a randomly selected data point and its KNN.While this approach may be simple, it has proven to be highly practical, leading to its widespread adoption.However, it should be noted that SMOTE does not have a solid mathematical underpinning [24].
Just like any other dataset collected from real-world scenarios, both the images and labels in our dataset are susceptible to noise.This could manifest as artifacts, out-of-focus regions, or even underexposed or overexposed images.Acknowledging these realities, one of our primary objectives in this paper is to devise a resilient approach that can function effectively despite the presence of noise and variance.
In pursuit of this goal, we implemented several preprocessing steps designed to transform all images into a format suitable for training the model.The specific preprocessing actions we undertook include: − Resizing all images: Given the varying sizes of the images in our dataset, we standardized them by resizing all images to a fixed size of 256×256 pixels.This not only helped ensure consistency in the input data but also reduced memory requirements and expedited the training time.− Normalization: To ensure that all the images were on the same scale, we performed normalization, thereby bringing the data set to a range of 0 to 1. − Application of the SMOTE method: To combat the class imbalance present in our dataset, we implemented the SMOTE technique.This helped increase the representation of underrepresented classes, resulting in a more balanced training set.Post the preprocessing phase, we were able to achieve a balanced dataset with 2,324 images per class, and all images were standardized to a size of 256×256×3, ready to be utilized in the next steps of our proposed method.

Pre-trained VGG-16 models
CNNs are a specific kind of artificial neural networks.Over the past few years, CNNs have become prominent in the domain of computer vision due to their exceptional performance in image analysis tasks.A standard CNN consists of an input layer, an output layer, and numerous hidden layers.These hidden layers typically include convolutional layers, pooling layers, normalization layers, fully connected layers and activation layers [25].
One such CNN is the VGG16 model, proposed by Simonyan and Zisserman [26].This model has been widely recognized for its exceptional performance in the 2014 ImageNet competition, where it attained an impressive top-five accuracy rate of 91.90%.The architecture of VGG16 is defined by its 138,355,752 parameters, 5 convolution blocks, and 3 dense layers.Each block contains multiple convolutional layers, succeeded by a max-pooling layer, which aims to minimize the output size of the block and remove noise.Specifically, the first two blocks consist of two convolutional layers each, while the last three blocks each contain three layers.Across all its layers, the network uses a kernel size with a stride of one [27].
After the five convolution blocks, a Flatten layer is added to transform the 3D vector output from these blocks into a 1D vector suitable for the fully connected layers.The model's first two fully connected layers consist of 4096 neurons each, while the final fully connected layer includes 1000 neurons.A SoftMax layer is incorporated following the fully connected layers to guarantee that the overall output probability amounts to one [27].
In essence, the detailed architecture of VGG16, combined with its depth and complexity, makes it a powerful pre-trained model for our study.Its capability to capture minute details and patterns in images furnishes us with a reliable and robust foundation.This foundation is invaluable as it allows for further manipulation and adaptation, ensuring that our model is tailored to meet the unique challenges presented by our specific research needs.

Transfer learning
Transfer learning has emerged as an effective solution to the issue of scarce and hard-to-collect datasets, a common occurrence in medical image analysis.When a CNN is used to train these datasets, the network often succumbs to overfitting due to the lack of a sufficient variety of data.To circumvent this issue, we employ transfer learning, a strategy that employs the features a network has learned from one issue to tackle another, albeit related problem within the same field.
Several advantages come with the application of transfer learning.Primarily, it offers a significant reduction in computational time.Rather than constructing an entirely new model from scratch, transfer learning allows us to capitalize on the knowledge acquired from previous training processes.Additionally, it broadens the scope of information extracted from previous models, thereby enriching the learning process.Lastly, transfer learning proves particularly beneficial when dealing with a small new training dataset.By utilizing the comprehensive feature representation learned from larger, previously trained datasets, transfer learning helps mitigate the adverse effects of limited data availability on model performance [27].
As an evolution in machine learning techniques, transfer learning stands as a beacon, presenting a pragmatic and efficient means to harness the abundant knowledge stored within pre-trained models.Drawing from extensive prior training on vast datasets, these models become invaluable reservoirs of intricate features and patterns.In the context of our study, this methodology proves indispensable, arming us with the tools to address complex image analysis challenges, especially when navigating through the constraints of scarce data.

Fine-tuning
Fine-tuning is a strategic approach in training a CNN, which involves using a pre-existing set of weights alongside new data.Essentially, the weights from the pre-trained CNN model are harnessed to initialize a target CNN model with an identical architecture.Subsequently, the target CNN is supervised and trained on the new target data.
Fine-tuning can be conducted in two distinct manners.The first approach, referred to as comprehensive fine-tuning, requires fine-tuning of all the network layers of the CNN model.This strategy is particularly applicable when there's a considerable lack of correlation between the target and source domains.In these scenarios, it becomes crucial to finely adjust all layers to ensure appropriate model performance.The second approach involves fine-tuning the pre-trained CNN model on a layer-by-layer basis [12].This method offers the flexibility to adjust specific layers that are more relevant to the new task, thereby saving computational resources and potentially improving model generalization.
For our study, we've chosen the layer-by-layer fine-tuning approach.This decision stems from the desire to harness the pre-trained model's knowledge most effectively.By selectively fine-tuning specific layers, we can optimally adapt the model to the nuances of our target task, emphasizing relevant features and patterns while minimizing the risk of overfitting.

Classification
Fundus images offer a detailed view of the retina, capturing the structural features of the human eye and playing a pivotal role in ophthalmological diagnostics.However, these images are inherently different from natural photos.They possess unique properties due to the intricate vascular structures and reflectance variations, setting them apart and making them potentially challenging for standard image analysis techniques [17].Recognizing this distinctiveness, we turn to advanced methodologies.In our proposed method, we employ the VGG16 pre-trained model, initially trained on the extensive ImageNet dataset.Using this model allows us to efficiently extract significant global features from fundus images, harnessing the robustness of VGG16 while catering to the unique attributes of the retinal images.

ISSN: 2088-8708 
Rapid detection of diabetic retinopathy in retinal images: a new approach using transfer … (Hiri Mustafa) 1097 After the pre-processing phase, the images, resized to a dimension of 256×256×3, are fed into the VGG16 model.The model processes each image through a sequence of convolutional and max-pooling layers.We use filters with a small receptive field, specifically 3×3 for the convolutional kernel and 2×2 for the max-pooling kernel.In the model's architecture, the initial two blocks each comprise two convolutional layers, while the subsequent three blocks consist of three convolutional layers each.
As part of our fine-tuning strategy, we made a deliberate choice to fix the weights of the fifth block, retaining the pre-trained weights of VGG16 from the ImageNet dataset.This block is crucial, as it encapsulates high-level features that are extensively trained on diverse images, ensuring that we do not lose these intricate patterns during our own training.Leveraging these already learned features offers us a dual advantage: it not only accelerates the training process but also enhances the model's capability to identify nuanced patterns in retinal images.This strategic decision is grounded in the belief that harnessing the power of these high-level features, which were cultivated from a vast and diverse dataset, would bolster the precision and efficiency of our model in the specific task of detecting DR.
Following the convolutional layers, a fully connected layer (FC) and three dense layers are included.The FC layer comprises 32,786 neurons, while the first and second dense layers consist of 1,000 and 100 neurons, respectively.To cap the architecture, a SoftMax layer with five neurons (corresponding to each class) serves as the final layer, ensuring the output probabilities sum up to one.
Addressing the prevalent challenge of overfitting in deep learning models, we have strategically integrated dropout layers subsequent to each dense layer.Dropout acts as a regularization technique, randomly deactivating a subset of neurons during training.This procedure effectively curtails the model's tendency to overly rely on individual neurons, thereby fostering robustness and enhancing its generalization capability across diverse datasets.

Performance metrics
In the application of the SMOTE technique, we obtained a total of 11,620 fundus images.These images were then divided into two subsets: a training set consisting of 10,458 images and a validation set with 1,162 images.The validation set is used to evaluate the performance of our proposed method.
Four essential metrics were employed to evaluate the performance of our approach: accuracy, precision, recall, and F1 score.The equations for these metrics are as ( 1)-( 4 Here's a brief explanation of each metric: − Accuracy: This measures the overall prediction rate of the algorithm, essentially determining how often the model is correct in its predictions.− Recall: Also known as sensitivity, recall indicates how effectively the classifier identifies true positives.It is the ratio of correctly predicted positive observations to the total actual positives.− Precision: Precision, also referred to as positive predictive value, divides the truly positive examples by all the examples that the model predicts as positive.It assesses the proportion of correctly identified positives out of all predicted positives.− F1 Score: The F1 score is the harmonic mean of recall and precision.It is particularly useful when you want to balance these two metrics and is most effective in situations where there is an uneven class distribution.− These metrics provide a comprehensive evaluation of the model's performance, ensuring not just overall accuracy but also the correct identification of positive cases and a balanced assessment of precision and recall.

RESULTS AND DISCUSSION
The proposed method was tested on a dataset of 1,162 input fundus images, carefully selected to represent a wide range of eye conditions.The confusion matrix stands as an indispensable evaluative tool in machine learning for gauging the classification prowess of models.It provides a tabular overview where actual classes are delineated as rows and the respective predicted ones as columns.This matrix aids in pinpointing specific areas where the model shines and where it falters, offering a granular view of its performance [28].
While our model's overall predictive accuracy is undeniably impressive, it did confront challenges, especially with the 'Mild NPDR' and 'Moderate NPDR' classes.Insights from the confusion matrix underscored instances where the model's predictions deviated from the true classifications.These discrepancies, prominently depicted in Figure 4, pinpoint the need for potential refinements in our approach.Recognizing these nuances can guide our endeavors in honing specific classes, ultimately augmenting the model's precision and effectiveness.In the expansive realm of scientific research, it becomes pertinent to understand how our methodology stacks up against other leading approaches.Thus, a comparative evaluation was undertaken against leading transfer learning-based algorithms, particularly those focusing on grading DR through retinal fundus images.The Kaggle diabetic retinopathy dataset served as the testing grounds for this evaluative process, and the results of this head-to-head comparison are comprehensively presented in Table 4.
Prominent techniques in the space, like the one from Mohammadian et al. [29], have achieved noteworthy results.Their use of the InceptionV3 model, combined with transfer learning, allowed them to clinch top accuracy for two classes on this shared dataset.On the other hand, Chen et al. [30], by employing a distinctive methodological tack, managed to clock an accuracy rate of 80% for five classes.
Standing in comparison, our proposed methodology managed to edge out contemporaneous techniques.Specifically, for two classes, our model surpassed other methods, albeit by a relatively slender margin of 1.07%.Moreover, when extended to five classes, our method showcased a substantial leap in accuracy, registering an improvement of 8.19%.These metrics underscore our method's palpable advancement in the field, attesting to both its precision and its innovation compared to other prevailing methodologies.

CONCLUSION AND FUTURE WORK
This research has put forth a novel method for classifying DR images by ingeniously combining the SMOTE technique to address data imbalance issues and harnessing the performance of VGG16, which was pre-trained on the 'ImageNet' dataset.The pre-trained filters of VGG16 were employed as a feature extractor, and the fine-tuning method was applied to block 5. Lastly, we utilized the SoftMax function to classify the five distinct classes.
The medical imaging landscape has seen a plethora of methods being developed in recent times to tackle DR classification.Amid this burgeoning array of techniques, our approach stands out, not merely for its novelty but for its demonstrable efficacy.We managed to surpass many of its counterparts by achieving an impressive accuracy rate of 88.19%.Such a commendable result is not just a testament to our methodology's prowess but also signifies its potential in ushering in new advancements in medical image analysis.
In terms of future work, we aspire to train our system on a comprehensive dataset.This effort will aim to enhance the model's generalizability on the validation dataset, thereby broadening the scope of its application and increasing its predictive power.The ultimate objective is to devise a robust and reliable system that can effectively contribute to the early detection and management of DR.

Figure 1 .
Figure 1.Stages of DR starting from a normal fundus image Int J Elec & Comp Eng ISSN: 2088-8708  Rapid detection of diabetic retinopathy in retinal images: a new approach using transfer … (Hiri Mustafa) 1093

Figure 2 .
Figure 2. Distribution of each category in the data set

Figure 3 .
Figure 3. Architecture of the proposed system for classification of DR

Figure 4 .
Figure 4. Confusion matrix of the system

Table 2 .
Table 2 encapsulates the performance measures for each class Elec & Comp Eng, Vol.14, No. 1, February 2024: 1091-1101 1098 within the proposed approach.It is of particular interest that our model excelled in classifying the 'Proliferative DR' class, attaining the highest precision and F1-score percentages.Conversely, the 'Normal (No DR)' class demonstrated the highest recall rate, signifying the model's ability to correctly identify this category.Classification performance of each class in the system

Table 3
summarizes the accuracy and error metrics of our proposed model, providing a holistic assessment of its efficacy.Notably, during the training phase, the model showcased a mean square error of 11.81% which increased to 21.48% during the validation phase.This offers vital insights into its overall performance.The commendable accuracy scores, standing at 93.94% during training and 88.19% during validation, solidify the model's robustness and reliability.

Table 3 .
Classification accuracy of the system

Table 4 .
[13]arison of classification results with other methods of classifying DR on the Kaggle diabetic retinopathy data set Study Architecture Number of classes Dataset size Performance measure Results Masood et al.[13]