Performance evaluation of transfer learning based deep convolutional neural network with limited fused spectro-temporal data for land cover classification

ABSTRACT


INTRODUCTION
In recent times, the field of deep learning has gained immense popularity and has become a central topic in big data research due to its superior performance compared to traditional machine learning algorithms.This has resulted in its successful application in a variety of fields, such as image identification [1], natural language processing [2], and speech enhancement [3].The use of deep learning models in remotely sensed images has also become increasingly popular, with models such as ResNet, AlexNet, and capsule network exhibiting remarkable performance when trained with ample labelled data.However, creating large-scale, welllabelled datasets for remote sensing is challenging due to the high cost associated with data collection and annotation [4].Furthermore, the increasing availability of large amounts of data from advanced satellite sensors has led to newly collected remote sensing data often lacking labelled information, posing a challenge to deep 6883 learning models.To address this issue, transfer learning-based frameworks have been developed, which use existing labelled remote sensing data to supplement the new, unlabeled data [5].Transfer learning [6] is a technique inspired by the human tendency to apply prior experience to new activities, and it involves training a neural network model on a problem like the one being solved, leading to a reduction in training time and lowering generalization error [7].The success of deep learning models heavily depends on the availability of ground truth training data, which is not always accessible.Therefore, researchers have developed transfer learning, which involves the use of finely-tuned pre-trained models, as seen in applications such as crop yield prediction in regions with limited training data.This manuscript focuses on evaluating and investigating the performance of a transfer learning-based convolutional neural network (TL-CNN) [8] pre-trained on spectrotemporal satellite data for remote-sensing scene classification over a limited imbalanced dataset.Additionally, a shallow artificial neural network and a machine learning algorithm trained from scratch exclusively on the new dataset are also evaluated.
Deep learning has been increasingly popular in recent years across a variety of academic and professional fields, particularly in computer vision [9].Convolutional neural networks (CNNs) [10] have demonstrated extraordinary success in a variety of applications, including speech and object recognition [11].One such model is the AlexNet, introduced by Krizhevsky et al. [1], which has played a vital role in the widespread adoption of deep learning in computer vision.CNNs are currently the leading approach in many image-related tasks, including image classification, segmentation, and detection, and have achieved outstanding results in benchmarks such as the Modified National Institute of Standards and Technology (MNIST) handwritten database and the ImageNet dataset [12], [13] which comprises billions of natural images.
However, training CNNs with limited data can be challenging despite their advanced featureextraction capabilities [14].Yin et al. [15] and Yosinski et al. [16] discovered that CNN models trained on diverse sets of images tend to learn parameters in a common pattern, where layers closer to the input learn generic features, while those farther from the input learn more specific features relevant to the dataset.This finding led to the development of transfer learning, which involves using filters learned by a CNN during primary tasks for unrelated secondary tasks [17], [18].The primary CNN model can also be used to extract features and serve as a foundation for the secondary task.
Despite the fact that huge datasets have the ability to significantly enhance CNN performance, transfer learning has made it possible to use CNN in scientific disciplines with limited data.For example, Lima et al. [19] used transfer learning to identify photos of drill cores from oil fields, Lima et al. [20] used it to categorize Herbarium specimens, Lima et al. [20] used it to classify diverse geo-science images, and so on.Razavian et al. [21], used transfer learning to research a variety of recognition tasks, including item picture classification, scene identification, and image retrieval, while Duarte-Coronado et al. [22] used it to detect porosity in thin section images.Remote sensing has also made substantial use of transfer learning.

STUDY AREA
The Agriculture University's research farm in Peshawar, Pakistan, was taken into consideration for the experimental setup.The region covers about 125 hectares of area and is regarded as experimental grounds for a variety of vegetation.The targeted pilot region illustrated in Figure 1 is a small area consisting of numerous types of agricultural crops, such as wheat, mustard, clover and also the urban class.

Data description
Data utilized in our research is the multi-spectral sentinel-2 and planet scope data.Multi-spectral sentinel-2 is acquired from Copernicus open hub while high resolution planet data is acquired from Planet Inc through a research grant.The European satellite sentinel-2A was launched in June 2015.It has a multispectral sensors that can produce images in thirteen different spectral bands, from visible to short-wave infrared [23].Whereas, California-based private company Planet Inc. provides satellite imaging services, launched its first dove nano satellite in 2015.Its spatial resolution in ISS orbit ranges from 2.7 to 3.2 m, while 3.7 to 4.9 m in Sun Synchronous.With a one-day temporal resolution, it gives geo analysts access to 4 channels with global coverage, comprising red, green, and blue (RGB) and near infrared reflectance (NIR) [24].

METHOD
Figure 2 illustrates the flow diagram of the adopted methodology.In order to capture the temporal information along with spectral surface reflectance values over the pilot region, sentinel-2 data was acquired on the 3 rd of February, the 18 th of February, the 19 th of March, and the 23 rd of April, whereas, a single planet scope scene was acquired on 29 th March.After calculating normalized difference vegetation indices (NDVI) for each image, the acquired data are then layer stacked and fused into a single image comprising of four bands (red, green, NIR and NDVI) each, and accumulating to twenty overall bands.In the specified pilot region, a survey was performed to gather ground truth data.We used our native Android based Geo Survey application to conduct the survey.During our survey eight different categories were identified over the pilot region.

Ground truth data
Survey was conducted to gather the ground truth data used in classification of crops in the region of interest using Geo Survey application.Collected field samples are mentioned in Table 1. 8 landcover types were collected during this activity which include wheat, urban, turnip, clover, oats, mustard, chickpea and barley.Our indigenously developed mobile survey application "Geo Survey" was used for this purpose.This application is freely available in both Android and iOS.The free Geo Survey application, which is accessible from the Google Play Store [25] is used to collect training data.Training dataset is made up of 70% of the data from the ground truth survey, while the remaining 30% is used for testing.The fully connected layers of the TL-CNN as well as the artificial neural networks (ANN) model algorithms are trained using the training data.In contrast, 30% of the test set data are used to validate the models' performance.

Artificial neural networks
In our experiment, we used an ANN, which is a kind of machine learning algorithm created to resemble the human cognitive system.Figure 3(a) shows the input layer, output layer, and hidden layer as the three layers that make up the ANN model.We performed non-linear classification and carried out supervised learning using a multi-layered feed-forward ANN with back propagation.The complexity of the features that neural networks learn from their input data varies with the depth of the networks.
However, as we have limited data, therefore we have not employed depth layers, rather a single hidden layer is used in neural network architecture.The implementation of the model and its utilization over a limited labeled dataset is due to the fact that usually machine learning models works best in comparison to deep learning algorithms over limited data.The model architecture of the ANN having spectro-temporal features at input is depicted in Figure 3

Transfer learning based deep convolutional neural network
The deep convolutional neural network inspired by inception, as proposed by Minallah et al. [27] was the basis for the deep learning model that we aimed to build.Three temporal convolutional inception blocks, a dense layer, and a softmax layer are included in the original convolutional neural network model that was inspired by Inception.These layers are used to classify data into many categories.Parallel filters of sizes 1, 3, and 5 are utilized to create feature maps in each of the temporal convolutional inception blocks, which are then combined and sent on to the following layer.In a single temporal convolutional inception block, there are 32 filter units overall for each of the three filter sizes (1, 3, and 5), which makes it a total of 96 units.These blocks which employs 1D convolutional filters, are reason behind spectro-temporal feature learning.The convolutional filter already demonstrated their effectiveness in learning temporal characteristics [24].The referenced model takes spectro-temporal features as input and predict the output class.4, the model receives a 4×5-2D matrix as its input, where the row represents spectral bands, while the columns are timesteps ( 1,  2 ,  3 ,  4 ,  5 ) on which the images are acquired.Thus, the same pixel from numerous dates is stacked in a matrix column.In order to initialize the network, the pre-trained weights from the spectro-temporal data were employed after the convolutional block of the original network design of the Inception-inspired deep CNN was frozen.The initialized weights were then changed during the fine-tuning process so that the network could learn the specific features of the new task with limited unbalanced input.The last layers of the original model were modified with inclusion of fully connected layers in proposed model.This modified model is then trained from scratch on the limited imbalanced Spectro-temporal data for different classification classes with fine tuning of various hyper-parameters.Figures 5(a For the purpose of classifying the study area, we used both our ANN and TL-CNN models to examine the raster multispectral remote sensing data that has been merged spectro-temporally.The results were then used to generate classified landcover maps of the pilot region, as depicted in Figure 6.

Accuracy Assessment
Performance of the models is measured by class-wise precision, recall, F1-score, and a weighted average of F1-score and overall accuracy [28].Due to the unbalanced structure of the dataset, the weighted average of the F1-score is given special attention for the evaluation of the overall performance of the classifiers.Equations ( 1) to ( 4) are used to determine the precision, recall, and F1-score.

RESULTS AND DISCUSSION
The effectiveness of the current artificial neural network and the suggested deep convolutional neural network based on transfer learning will be discussed in this section.Neural networks are incredibly effective algorithms that can be used to learn a non-linear relationship between input and desired output.The features' degree of complexity learned from the input data relates to the depth of neural networks.However, as we have limited data, therefore we have not employed depth layers, rather a single hidden layer is used in the neural network architecture.In Table 2, the classification report of the ANN model, shows that the ANN achieved 0.87 weighted average of F1-score, while classification accuracy is 0.89.The weighted average better approximates the model performance as the dataset used is imbalanced in nature.From the same table, it can also be seen that overall performance of the model in precision and recall is satisfactory, as the model achieved 0.84 and 0.86 weighted average respectively.However, for individual classes where the training data is comparatively less such as Turnip, Chickpea and Barley, the model does not perform well, as can be seen in Figures 7(a) and 7(b).Although, the urban class is also having less data, but it performed well with 0.94 and 1 precision and recall score.This is because that the urban class has different spectral reflectance, as compared to other limited data classes.The Turnip, Chickpea, and Barley classes degraded performance is because of limited training data as well the spectral overlap between these classes, which is confusing for the model to learn.For all other classes the performance of the ANN is quite satisfactory.The classification report of our implemented transfer learning-based CNN model on test set is shown in Table 3.Which shows that our TL-CNN model achieved 0.92 weighted average of F1-score and an overall classification accuracy of 0.93.The model has a good overall performance with limited data and an uneven class distribution.All the classes having similarity with the pre-trained model dataset have achieved good precision, recall and F1-score.However, the performance of the classes having different spectral reflectance values than the original pretrained model dataset, spectral similarity among themselves and also have limited number of samples are not good.The F1-score for the Chickpea class is 0.71, the least among all the classes, followed by Barley and Turnip classes, which are 0.82 and 0.83 respectively.The precision and recall score for both algorithms over individual classes is illustrated in Figures 7(a

CONCLUSION
In this study, the performance of a shallow artificial neural network algorithm and a convolutional neural network based on transfer learning were compared.It is determined from the experiments and findings that limited data do impact the performance of the model, however the variation in the spectral reflectance values of the various classes is more important for class distinction and model performance.Furthermore, it is also concluded that using a relatively small dataset with a machine learning model can yield satisfactory results for classes having varying spectral reflectance values, however the advantages of a transfer learning based pretrained deep convolutional neural network may be observed in this use scenario.
Int J Elec & Comp Eng ISSN: 2088-8708  Performance evaluation of transfer learning based deep convolutional neural … (Muhammad Hasanat)

Figure 1 .
Figure 1.Locality map of agricultural farm, University of Peshawar

Figure 2 .
Figure 2. Data-flow diagram of experimental setup (a).The activation function receives the sum of the input values.The network provides feedback once the input data has been processed, and if an error has occurred, it is fixed by recursively adjusting the network weights[26].Performance of the classifier is analyzed while adjusting various parameters and weights of the interconnected nodes.How much internal weight has affected the node's level of activation is determined by training activation threshold settings.The size of the weight change for certain nodes can be estimated with good accuracy using training rate.The training momentum parameter prevents convergence of the system to a local minimum.To stop the algorithm from obtaining additional training, training root-mean-square (RMS) Exit criteria are utilized.Mathematical calculations are needed to uncover hidden layers.The number of training iterations decides how many more iterations the algorithm should undergo training and the min output activation threshold establishes the threshold value below which a pixel is categorized as unclassified.

Figure 3 (Figure 3 .
Figure 3. Illustration of input data structure and parameter selection of deep neural networks (a) ANNs with temporal features and (b) parameter selection for ANNs


ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 13, No. 6, December 2023: 6882-6890 6886 In our proposed transfer learning-based inception inspired CNN model presented in Figure ) and 5(b) shows the loss and accuracy of the TL-CNN model over hundred epochs.

Figure 6 (
a) presents classified results generated through ANN while Figure 6(b) shows classified map for TL-CNN.The results of transfer learning-based CNN implemented model are explained in results and discussion section.

Figure 6 .
Figure 6.Land cover land use map generated through (a) ANN classified map and (b) TL-CNN classified map ) and 7(b).The difference in support pixels of both these models is due to the fact that ANN has included some of the pixels of different classes into unclassified group, which  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 13, No. 6, December 2023: 6882-6890 6888 is omitted.However, can be seen in ANN classified map Figure 6.The overall performance of both the models by weighted average of F1-score and classification accuracy can be seen in Figure 7(c).

Figure 7 .
Figure 7. Precision, recall and overall accuracy of the employed models as seen in (a) graphical comparison of ANN and TL-CNN through precision, (b) graphical comparison of ANN and TL-CNN classified map, and (c) graphical comparison of F1 score and accuracy

Table 1 .
Vegetation categorization in training data Performance evaluation of transfer learning based deep convolutional neural … (Muhammad Hasanat) 6885

Table 2 .
Simulated results of ANN classifier

Table 3 .
Classification results of CNN