Deep convolutional neural network-based system for fish classification

Ahmad AL Smadi1, Atif Mehmood1, Ahed Abugabah2, Eiad Almekhlafi3, Ahmad Mohammad Al-smadi4 1School of Artificial Intelligence, Xidian University, Xi’an, China 2College of Technological Innovation, Zayed University, Abu Dhabi, United Arab Emirates 3School of Information Science and Technology, Northwest University, Xi’an, China 4Department of Computer Science, Al-Balqa Applied University, Ajloun University College, Jordan


INTRODUCTION
In recent years, computer sciences and technology have played a key role in many areas, such as the internet of things [1], network security [2], object detection, scene classification [3], and remote sensing [4]. Scene classification plays a key role in daily life due to alteration in the scenes' countenance and environment. Nowadays, fish classification (FC) is being a vital study for further aquaculture and conservation. FC is defined as the process of distinguishing and perceiving fish species and families depending on their attributes by using image processing. It determines and classifies the objective fish into species depending on the similarity with the representative specimen image [5]. The recognition of fish species is widely considered a challenging research area due to difficulties such as distortion, noise, and segmentation error incorporated in the images [6]. The experts face some difficulties in identifying and classification fish due to many of fish categories [7]. Previous works have only focused on environments, notwithstanding the needing for FC, and recognition has been raised. Recent developments in machine learning algorithms are among the most widely used for FC [8]. Generally, fish identification can be categorized into two groups as follows [9]: i) classification through internal identification [10], [11] in which attributes such as the primary structural framework and length could be extracted, then a fish expert database was established, and the fish was identified with the help of an algorithm [12] and ii) classification through the identification of the exterior part of the fish [13], [14]. An increasing number of studies have found that the effective and basic utilized strategy is to take pictures of fish by photo capture devices. Consequently, a correlation can be made between the current pictures and books of fish identification and pictures that have been taken. Hence, different fishes can fall into comparing classifications [13]. There are several approaches used to classify fish species in the literature based on structural and textural patterns [15], [16]. Hsiao et al. [17] utilized a sparse representation combined with principal component analysis to fish-species classification and attained an accuracy of 81.8%. Alsmadi et al. [13] introduced a fish classification model that utilized the combination between extracted features and statistical measurements. Some works were carried out on fish classification by utilizing the backpropagation algorithm, support vector machines [18], [19]. Islam et al. [20] proposed a hybrid local binary pattern to classify indigenous fish in Bangladesh. They generated a new dataset named BDIndigenousFish2019, then used SVM with different kernel sizes for indigenous classification fish and attained an accuracy of 94.97%. More recently, deep learning (DL) is gaining much attention in image classification [9]. Rathi et al. [21] introduced a technique to classify 21 fish species based on deep learning and attained an accuracy of 96.29%. Khalifa et al. [8] introduced a deep learning model to classify aquarium fish species and attained an accuracy of 85.59%. Deep learning demonstrated remarkable FC results for large-scale training datasets of fish images [22]- [24]. Kratzert and Mader [25] introduced an automatic system that used an adapted VGG network for FC. Chhabra et al. [26] proposed a hybrid deep learning approach (HDL) for FC. Abinaya et al. [27] introduced FC technique that combined three trained deep learning networks based on naive bayesian fusion (DLN-NB).
Fish classification issue is to distinguish and group a fish as per its species precisely. In the light of recent studies in FC, this paper proposes a new classification model based on CNN that classifies the indigenous fish dataset. Our model is trained by utilizing eight distinct types of indigenous fish types from Bangladesh. Therefore, our classification model's success rate for the indigenous fish dataset with three different data splitting attained is 98.47%, 97.24%, 97.70%, respectively. This paper has contributions in several aspects: -There is no study based on CNN in the literature that classifies the "BDIndigenousFish2019" dataset to the best of our knowledge. -We proposed a new classification model based on CNN to classify the BDIndigenousFish2019 dataset. -This study includes an analysis of 5 different state-of-the-art gradient descent-based optimizers.
-This study includes a comparative result of the state-of-the-art methods with CNN.
The rest of this paper is outlined as the following: section 2 reviews a brief description of gradient descent based optimizers; section 3 introduces some related deep convolutional neural networks; the materials and the proposed method introduced in section 4; section 5 presents the experimental results and analysis; section 6 provides the discussion of this paper, and section 7 concludes this paper.

2.
GRADIENT DESCENT BASED OPTIMIZERS Many factors play a critical role in the efficiency of the convolutional neural network, such as optimization, batches, epochs, learning rate, activation function, and network architecture [28]. Optimization algorithms require fewer resources, make the model converge faster, and can influence machine learning mainly by optimizing learning parameters to speed up the learning process and consume fewer resources. Deep learning often requires a lot of time and powerful computer resources to carry out the training process. It is also a major reason impeding the development of deep learning algorithms. Despite our ability to use multi-computer distributed training to accelerate a typical learning, the required computing resources have not been reduced. Therefore, to reduce the error rate during the training process in CNN-based techniques, many gradient descent-❒ ISSN: 2088-8708 based optimization algorithms were used [29], such as AdaDelta, SGD, Adam, Adamax, Rmsprop. The following subsections introduce a brief description of gradient descent-based optimization algorithms that are used in this study.

Stochastic gradient descent (SGD) optimization algorithm
The SGD process starts from a random point and moves in steady steps to reach the training moment, but this requires a large number of iterations due to randomness [30]. And the learning rate does not change during the training process. The following equation shows linear regression utilizing gradient descent: where E i (ω) represents the estimated data, E denotes an error function. Therefore, the SGD algorithm computes the best ω by minimizing E at the same time. Thus, the following equation shows the composition of regular gradient descent: where the error objective is estimated by (3):

Adaptive delta (AdaDelta) optimization algorithm
AdaDelta is developed to reduce aggressiveness [31], strictly decreasing the learning rate of adaptive grading (AdaGrad). Unlike the AdaGrad optimization algorithm, which takes accumulating the previous squared gradients [32], the AdaDelta takes the accumulated past gradients to fixed window size. In other words, the AdaDelta algorithm enhances the sharp descent direction expressed by a negative gradient as (4): where g t represents the gradient at the i th iteration δf (xt) δf (xt) , and η denotes a learning rate.

Root mean square propagation (Rmsprop) optimization algorithm
Rmsprop is a derivation from the adaptive grading algorithm [33]. It depends on dividing the learning rate of the weight by the current average of the modern gradient values of this weight and maintains the rate of learning for each transaction depends on it (i.e., the total learning rate in it is almost constant). Still, it calculates the gradient as the regression's mean exponentially rather than the sum of its gradients. The algorithm has excellent performance on unstable problems. Therefore, the running average can be estimated by (5) and (6): where E g 2 t represents the running average, γ is the decay term, g t represents the squared gradients moving average. ϵ is a tiny number to forestall any division by zero, and η represents initial learning rate.

Adaptive momentum (Adam) and Adamax optimization algorithms
Adam optimization algorithm is an extension of the SGD algorithm and has recently been widely used in deep learning applications, particularly computer vision and natural language processing tasks [34]. Adam's algorithm differs from the regression of the stochastic derivative (SGD) in that the SGD maintains a single learning parameter to update all weights. Adam can update the weights of the neural network repeatedly based on the training data. Moreover, Adam's algorithm calculates the adaptive treatment learning rate based on the average value of the first moment, such as the Rmsprop algorithm, and fully uses the average value of the second moment of the gradient. Adam optimization algorithm can be estimated as (7) to (10): where ζ 1 ,ζ 2 are hyperparameters. η, g t , f t , s t , represent initial learning rate, a gradient at time t, an exponential average of gradient along ω, and an exponential average of squares of gradient, respectively. ϵ is a tiny number to forestall any division by zero. Adamax optimization algorithm is developed inspired by Adam algorithm; Adamax provides a simpler range for the maximum learning rate [35], as (11): where u t is the exponentially weighted infinity norm.

DEEP CONVOLUTIONAL NEURAL NETWORKS
Deep learning is an area of machine learning that utilizes hierarchical architectures to learn high-level data reflections in many applications [36]. Moreover, the data representation can be enhanced by increasing the number of layers [37]. The distinctive attributes, characteristics, and classifiers are trained simultaneously in deep learning. The initial layers, including convolution filters, non-linear transformation, and the pooling layers, are utilized for the feature extraction. Lastly, the fully connected layers carry out the classification. The most effective deep learning techniques, in which many layers are robustly trained and validated, are convolutional neural networks (CNNs). Three main layers consist of a standard CNN; convolutional layers, pooling layers, and fully connected layers. CNNs can be capable of extracting information when the datasets have wide variations regarding context and the objects present in the images based on their colour, structure, and characteristics of a surface [38]. There are some of the leading pre-trained deep convolutional neural network versions, such as AlexNet [39], VGGNet [40], and ResNet [41]. Therefore, the utilization and various application of pre-trained networks are growing.

AlexNet
AlexNet is designed by Alex Krizhevsky, and its one of the prominent deep CNN used in many applications. The AlexNet deep architecture consisted of 5 convolutional layers, 3 max-pooling layers, 3 fully connected layers, and a classifier layer as an output layer [39].

VGGNet
In order to reduce the number of parameters in the layers and improve on training time, VGGNet was designed by Simonyan and Zisserman whereas, all the convolutional kernels are of size 3×3. There are several variants of VGGNet, such as VGG16 and VGG19. The difference between VGG16 and VGG19 is the number of weight layers in the network. However, the drawbacks of VGGNet include time-consuming training and a large number of parameters [40].

ResNet
The ResNet architecture was developed by [41]. It's much more profound than VGGNet. There are multiple versions of ResNet, such as ResNet50, and ResNet101. The main contribution of ResNet is introducing a so-called "identity shortcut connection" that skips one or more layers [39].

MATERIALS AND METHODS 4.1. Image dataset
We trained our model on the BDIndigenousFish2019 (BD2019) dataset, which contains eight fish species from Bangladesh. The BD2019 fish dataset was first time used in [20] for a named approach HLBP. HLBP is the FC method using hybrid features with SVM classifier. Therefore, it is not fair to compare HLBP performance with DL-based methods. The BD2019 fish dataset contains 2610 images with eight categories. Figure 1 illustrates a sample image of each type. The sample species are shown in Figure 2. Images were resized to 224×224 as per model requirements.

The proposed model
The architecture of our model of FC is introduced in Figure 3. There are some preferences for utilizing convolutional neural networks among conventional strategies. Weight sharing in convolutional layers will decrease the number of parameters and make it easier to detect various attributes, such as edges, corners. The utilization of a pooling layer will address the known issue of sensitivity between the output features map and the input features' location, thereby providing invariance to changes in the extracted features' position and location. The batch normalization layer is used to make a deep network training robust by reducing internal covariate shift and more stable.
The proposed model consists of a series of steps: -The first layer input carries the image of size 224×224×3 and moves into the first convolutional layer having 32 feature maps. -After passing through a non-linear activation function (ReLU) and batch normalization, passing through a max-pooling layer. Thereby, the image dimensions been 28×28×128. -The second convolutional layer carries the previous layer's output as input with 64 feature maps. It is then moving into a non-linearity function (ReLU), batch normalization, and then a max-pooling layer, so the output is now reduced to 56×56×64. -The third convolutional layer has 128 feature maps, moving into a non-linear activation function (ReLU) and batch normalization, passing through the max-pooling layer. Thereby, the image dimensions been 28×28×128. -The fourth convolutional layer has 256 feature maps, moving into a non-linear activation function (ReLU) and batch normalization, then moving into a max-pooling layer. Thereby, the image dimensions been 14×14×256. It is worthwhile noting that for convolutional layers 1 to 4, the size of each kernel was 3×3 with a stride of 1. As well as, the filter size of the max-pooling layers was 2×2 with a stride of 2.

Int J Elec & Comp Eng
ISSN: 2088-8708 ❒ 2031 -The fifth to seventh convolutional layers are connected back-to back. These convolutional layers used 512, 265, 128 feature maps, respectively, followed by non-linearity function (ReLU), batch normalization, and then max-pooling layer. It is worthwhile noting that for convolutional layers 5 to 7, the size of each kernel was 5×5 with a stride of 1. As well as, the filter size of the max-pooling layers was 2×2 with a stride of 2. -The eighth convolutional layer having 64 feature maps, and the size of each kernel is 7×7 with a stride of 1. After moving into a nonlinear activation function (ReLU), the convolutional layer's output is flattened through a fully connected layer with 576 feature maps. Then it is connected again to a fully connected layer with 128 units. -Then, passing through the Dropout layer with 0.3 is connected again to a fully connected layer with 256 units. The softmax layer is utilized for the output layer with eight units that conform to the number of classes in the dataset. and 70-30%, with a comparative analysis of different optimization algorithms. Further, we provide an experiment with an augmentation approach. Table 1 illustrates the parameters setting of the proposed model. The softmax function is used at the last layer; other layers use the Relu activation. Since we used an imbalanced dataset, the classification accuracy may not be efficient, particularly when we have a multi-class classification task. Therefore, a confusion matrix for each kind of data splitting is computed that may yield more information, i.e., what a classification model gets right and the errors it makes. Thereby, the performance measures computed from the confusion matrix, including accuracy, sensitivity, and specificity [7].
-Sensitivity is defined as the ability to measure the proportion of positives that are correctly identified. It can be estimated as (12): where TP denotes the total number of correctly classified for the actual class, and FN denotes the total number of not-correctly classified for the actual class. -Specificity the ability to measure the proportion of negatives that are correctly identified. It can be estimated as (14): Specif icity = TN TN + FP (13) where TN denotes the total number of not correctly classified for the actual class and FP denotes the total number of the correctly classified for the not actual class.

Experiments on 70-30% data splitting
In this experiment, we divided the dataset for training 70% and testing 30%, which belonged to eight classes. The testing is used as validation data to validate our model; therefore, the final epoch result of the validation accuracy is used as test accuracy. Moreover, we performed the analysis on five optimizers. The average performance results were achieved on 50 epochs. For this experiment, the most successful optimizer is Adam which attained 98.47% testing accuracy. While, Adamax, Rmsprop optimizers were performed well, and the performances of Adamax, Rmsprop were 94.89%, 93.74%, respectively. Table 2 illustrates the evaluation metric on testing data for this experiment. The performances of these optimizers are shown in Figure 4(a) to Figure 4(d). We can observe that the performances of SGD and AdaDelta optimizers were very bad. The accuracy, sensitivity, and specificity rate on Five optimizers as shown in Table 2. From Table 2, it can be observed that the Adam optimizer has performed better as compared to other optimizers. Therefore, the confusion matrix of this experiment is given in Table 3.    Taki Tengra  Byen  148  0  1  0  0  1  0  0  Foli  0  88  0  0  1  0  1  0  Koi  1  0  113  0  0  0  0  0  Sing  1  0  0  118  0  1  0  0  Sol  0  1  0  0  34  0  1  0  Sorputi  0  1  1  0  0  58  0  0  Taki  0  1  0  1  0  0  115  0  Tengra  0  0  0  1  0  0  0  95

Experiments on 75-25% data splitting
In this section, the same experiments and analyzes were done, except we divided the dataset for training 75% and testing 25%. The performance of this experiment through these optimizers is illustrated in Figure 5 (a) to Figure 5(d) (see in appendix). We can observe that the most successful optimizer is Rmsprop which attained 97.24% testing accuracy. While, Adamax and Adam optimizers were performed well, and Adamax and Adam's performances were 96.63%, 94.18%, respectively. The performances of SGD and AdaDelta optimizers were very bad. Table 4 illustrates the evaluation metric on testing data for this experiment. From Table 4, it can be observed that the Rmsprop optimizer has performed better as compared to other optimizers. Therefore, the confusion matrix of this experiment is given in Table 5. Table 4. Evaluation metric on testing data for 75-25% splitting. The accuracy, sensitivity, and specificity rate on five optimizers as shown in Figure 5 Optimizers  Table 5.

Experiments on 80-20% data splitting
Another experiment has been done in this section in which we divided the dataset for training 80% and testing 20%. Thereby, the performance of this experiment through these optimizers is illustrated in Figure 6 (a) to Figure 6(d) (see in appendix). We can observe that the most successful optimizer is Adam which attained 97.70% testing accuracy. While, Adamax and Rmsprop optimizers were performed well, and the performances of Adamax and Rmsprop were 95.01%, 93.67%, respectively. The performances of SGD and AdaDelta optimizers were very bad. Table 6 illustrates the evaluation metric on testing data for this experiment. From Table 6, it can be observed that the Adam optimizer has performed better as compared to other optimizers. Therefore, the confusion matrix of this experiment is given in Table 7.

Int J Elec & Comp Eng
ISSN: 2088-8708 ❒ 2035 Table 6. Evaluation metric on testing data for 80-20% splitting. The accuracy, sensitivity, and specificity rate on five optimizers as shown in Figure 6 Optimizers  Table 7. Confusion matrix of the proposed model on the testing dataset concerning 80-20% splitting using Adam optimizer

DISCUSSION
The strategy of utilizing the CNN to classify fish species requires thousands or indeed tens of thousands of samples to perform duties for training. Therefore, the task of fish species image collection may be a hard demand to accomplish and to gather adequate information. In this paper, we overcome these issues that directly involved in decreasing the performance of the fish classification task and introduced the new model with high accuracy in terms of classification. Among the five optimizers, Adamax was the steadiest one than AdaDelta, and SGD, which have worse performance. Moreover, the Adam, Adamax, and Rmsprop optimizers can attain good accuracy at 20 epochs, while the SGD and AdaDelta optimizers could not be attained even after 50 epochs. From the above discussion, it is uncovered that our proposed CNN architecture with various optimization algorithms provides promising results for fish classification; thus, the importance of choosing the hyperparameters of the network. We found that our model performed very well without data augmentation. Thus, we compared our work with state-of-the-art deep CNNs models, including AlexNet, VGG-16, VGG-19, Resnet50, adaptive-VGG [25], HDL [26], and DLN-NB [28]. Table 8 (see in appendix) illustrates the comparison of results based on deep CNNs models and the developed model. It is worthwhile noting that the deep CNNs models were trained from scratch, and their results were obtained after 100 iterations.

7.
CONCLUSION This paper introduced a fish classification model with three data splitting and comparative analyses of five optimizers used in our proposed CNN model. The comparison is made on the publicly available BDIndige-nousFish2019 dataset. The results showed that three optimizers performed consistently. The Adam optimizer performed better among these optimizers concerning 70-30% and 80-20% experiments. On the contrary, the Rmsprop performed better in the 75-25% experiment. Therefore, the findings reinforce the significance of choosing the hyperparameters of the network used for classification. This paper demonstrated that state-of-theart results could be achieved for fish classification through deep CNNs. The experimental result came about embody that this strategy is productive and dependable among existing deep CNNs models. Further study could be to examine whether our model can be employed on the other classification tasks. It would be interesting to investigate if the results can be improved using other artificial intelligence methods such as generative adversarial networks GAN and different transfer learning methods. This makes the research results more reproducible and comparable.