Cassini-Huygens mission images classification framework by deep learning advanced approach

Received Jul 19, 2020 Revised Dec 12, 2020 Accepted Dec 28, 2020 Developing a deep learning (DL) model for image classification commonly demands a crucial architecture organization. Planetary expeditions produce a massive quantity of data and images. However, manually analyzing and classifying flight missions image databases with hundreds of thousands of images is ungainly and yield weak accuracy. In this paper, we speculate an essential topic related to the classification of remotely sensed images, in which the process of feature coding and extraction are decisive procedures. Diverse feature extraction techniques are intended to stimulate a discriminative image classifier. Features extraction is the primary engagement in raw data processing with the purpose of data classification; when it comes across the task of analysis of vast and varied data, these kinds of tasks are considered as time-consuming and hard to be treated with. Most of these classifiers are either, in principle, quite intricate or virtually unattainable to calculate for massive datasets. Stimulated by this perception, we put forward a straightforward, efficient classifier based on feature extraction by analyzing the cell of tensors via layered MapReduce framework beside meta-learning LSTM followed by a SoftMax classifier. Experiment results show that the provided model attains a classification accuracy of 96.7%, which makes the provided model quite valid for diverse image databases with varying sizes.


INTRODUCTION
Deep learning (DL) [1] is protruded as a novel scope of machine learning, applied, and affecting several domains in our daily life such as medical image processing [2], prediction [3], mobile traffic classification [4], computer vision [5], computer networks [6]. DL's concept can be described as straightforward as the process of feature learning by machines as they ordinarily very well in the field of classification. Remote sensing (RS) is a scientific defy where scenes are inspected and construed by remote means [7]. This term comprises the conventional RS scopes, such as images of the satellite. Remotely sensed data analysis and classification overwhelmingly implemented by supervised filters or classifiers. This necessitates a sufficient volume of labeled samples to provide the needed training for the classifier. The spacecraft of Cassini-Huygens is considered the greatest and the most intricate interplanetary mission ever constructed, and it was able to acquire a detailed image within different planetary conditions; science investigations in this respect have particular pay attention to the theoretical part. Our work will focus on the AI part to support high-rise demand on computerized scientific aspects, reducing effort, time, and hard to be accomplished by the human, with the point of view of analysis and classification a very massive image datasets that have been generated by the Cassini-Huygens mission. Meta-learning offers an ideal preference in which the model of machine learning earns knowledge over numerous learning epochs, frequently encompass an allocation of associated tasks. It also utilizes this knowledge to enhance its forthcoming learning efficiency.
We provide an approach that pinpoints the frailty of neural networks trained via the optimization of a gradient. We put forward an optimizer meta-learner that is emanated with LSTM. Within the training process, the classifier's performance is enhanced by the support provided by the optimizer. The proposed meta-learner attains together short-term learned data among the performed mission, and long-term acquired experience-reported knowledge occurring within the whole previous tasks. The meta-learner algorithm within our conception is trained to impel the classifier, so it tends to meet at a point of convergence toward an optimal solution swiftly for every task, this is the aim of choosing this algorithm.
This research's residual is arranged in the following order: Section 2 includes a description of the used dataset. Section 3 contains materials and methods along with the adopted algorithm portions description. Section 4 of the research comprises the results and analysis discussion. In the last Section 5 conclusions and future work are provided.

DATASET UTILIZED IN THE ANALYSIS
Cassini-Huygens spacecraft was so far the most aspirant expedition up till now sent to outer space, stuffed with a group of robust devices and cameras. Cassini-Huygens were eligible for gathering delicate measurements itemized images within several atmospheric circumstances. The spacecraft had two parts: the Huygens probe and Cassini orbiter, Cassini-Huygens arrive at Saturn in 2004, transmitting precious data back to us, which improved our comprehension of Saturn and its moons. Huygens step inside Titan's atmosphere, Saturn's massive moon, fall downward through a parachute to the furthest point so far, land on its surface, take samples and analyze them, and send the results to Cassini, which will send them later to the Earth. Cassini instruments of remote sensing collected data remotely from enormous distances. After twenty years spent in outer space and thirteen years touring Saturn, the orbiter "Cassini" drained out of energy. Cassini was immersed in Saturn's atmosphere on 15 of September 2017, and this is how the mission ended. The acquired images data has been generated by the imaging science subsystem (ISS), which has the best resolution for the acquired images. The ISS is composed of 2 detached cameras wide-angle camera and a narrow-angle camera. ISS image volumes dataset is composed of a massive number of images and their related labels that hold the images' metadata. The data set is publicly available at the reference [8].

MATERIALS AND METHODS
The meta-learning approach proposed in [9,10] works via executing a few-shot dataset sampling from an intended task and acclimating the approach inner portrayals among gradient descent certain steps. It must be stated that this research is a supplement of past investigation exertion associated with the Cassini-Huygens project dataset [11][12][13]. Recurrent model meta-learning is a part of the adopted model tailored to long short term memory (LSTM(. With this sub-framework, the algorithm of meta-learning shall train the LSTM model, which in its part must perform the needed dataset processing consecutively, and subsequently process the incoming data as new inputs to the SoftMax classifier, which requires passing the extracted features with the (image, label) pairs set for each batch of the dataset. Figure 1 illustrates a meta-learning layout while the adopted framework is shown in Figure 2, which includes three modules, a features extractor (G), a meta-learner LSTM (M), and a map-reducer discriminator (D), all of them are acquiring the knowledge altogether. From one side, we anticipate the features extractor (G) to extract all the related data during its task by capturing high valued features, which will clue the meta-learner LSTM (M) to carry out. On the other side, it is logical that the features extractor (G) will be reinforced via the map-reducer discriminator (D) over consequent tasks on a big dataset (Bd). After dealing with a massive number of data and its features, the features extractor (G) progressively learns and acquires the knowledge to extract features from raw image data; this mapping process will considerably facilitate the meta-learning implementation. By considering the domain big data (Bd) of possible experiences ET by arithmetic approach, we devise the subsequent optimization issue in (1) methodically: where , , represent the congruent modules parameters. The task of meta-learning (T) has a distribution p(T), while (x, y) symbolizes a sampled labeled instance from the Bd. The purpose is to decrease 2459 the joint expectation indicated by J. This joint will be applied on both losses: the loss LT (θM, θG), which is related to the task of meta-learning, and on the loss L(x,y) (θD, θG) which is related to map-reducer discriminator (D). A meta-learner can also commence a learner to bring up to date learner via gradient descent, which has a steady learning rate (R) over every task, gradient descent begins from incipient parameters θ0, and afterward, it carries out the subsequent update in (2): The previous equation is utterly analogical to the cell state update of the LSTM. With each metalearning process epochs, the associated task includes a training set train (T) and another one for testing (T). For the map-reducer discriminator (D) it should be able to eliminate the identical images or batches among the Bd and provide labeled batches and images. The map-reducer discriminator (D), and the meta-learner LSTM (M) have a joint loss with the purpose of reducing the expected loss with the task of map-reducer discrimination: where Ls can be any appropriate loss function for map-reducer discrimination, the features extractor (G) is supplied with the required training to extract the needed data from the image batches, while meta-learner obtain experience and learned to carry out the image and batch labeling. The map-reducer discriminator (D), which has given the parameter θD, is intended to foretell image labels created by G, and it is performed via gated neural networks. The features extractor (G) has given the parameter θG and is carried out via gated neural networks. The meta-learner LSTM (M), has given the parameter θM, and its mission is to gain the knowledge and pass from a task (T) to another among the training process. Over any provided task (T), the related concepts include meta-learner (M) that adjusts a learner AT for a given task with a specific aim, which is to reduce the anticipated loss with the task of meta-learning, and provided by (4): Implementing SoftMax on a huge dataset produce a more reliable LSTM, for classification, ordinarily, SoftMax function (5) is utilized by: where Pk indicates the chance or the probability that the vector k is a part of a group termed as class v. In multiclass classification, there is a need to compute the loss related to every class label for each process observation, and the outcome can be aggregated as a sum by cross-entropy, which is expressed by (6).
where Z-represents the number of classes that may include (Saturn, Rings, Titan, Icy Satellites, Small satellites (rocks), Sky), the log indicates the traditional log, At the same time B is a binary reference (1 or 0) that shows whether a class label k is the proper classification for a given observation. The indicator P represents the predicted probability of a given observation that belongs to a class k. To reduce cross-entropy loss, the proposed network is indoctrinated to present the result vector k near its related one-hot vector. It is crucial to pay attention that the right results of the target vectors within the network are steady throughout the training process.

The algorithm
After representing the framework, among deep meta-learning LSTM, the phase of illustrating our harmonized algorithm is necessary. The algorithm of stochastic gradient descent could be utilized to optimize the previous aims, but with our thematic model, we generated a modified version of the stochastic gradient descent method, the elaborated approach is summarized in Figure 1.
The upper section indicates the set of meta training D(meta-train) where each numbered box represents a different batch of the Bd that is composed of the training set denoted as Dtrain and Dtest. The meta-test set which is indicated in the illustration with D(meta-test) is also demonstrated in the same method, but via a various dataset that includes batches that are not available in any of the other batches in D(meta-train). Furthermore, there is a set of meta-validation which is exploited to specify additional labels and features. The adopted framework is presented in Figure 2.   (1), if we assumed that there is a learner classification process that has a parameter θ which it needs to be trained on the set Dtrain. The decisive algorithm which makes the best or most effective use of (a situation, opportunity, or resource) and also exploited to train the neural classifier is approximately divergent of gradient descent, that utilizes updates, where the expression of the (θt−1) represents the learner parameters in accordance with (t-1) updates, and (R) which is obtained during the time. period t, Lt denotes the loss, which is brought to optimization via the learner through n th number of update, ∇θt−1 Lt specifies loss gradient corresponding to the associated parameters θt−1, and θt demonstrates the learner updated parameters, the pivotal rumination that we parameterized at this spot this update which has common features that look or seem like LSTM cell state. The characteristics of the time difference ∆t between consecutive samplings as shown in Figure 3. The mean and standard deviation of the ∆t is 1,053 s and 12,282 s, respectively. This example figure demonstrates how the extracted image features (learned features) can be utilized to train the classifier. There are several parameters of similarity which could be exploited within our approach, and we consider the process of image classification by analyzing several parameters as a preprocessing phase for image classification and analyses of a block of tensors via layered MapReduce (LMR) for image patches classification where these investigation processes passed into (LSTM) blocks, we are using LSTMs block unit as it exploits the memory dependencies component to regulate the information flow, as we are dealing with a Bd meta-learning LSTM is a perfect choice. The prime factor to DL's prosperity is within its potentiality towards acquiring knowledge from hierarchical features via enormous amounts of raw data such as text or images. Extracted features through DL methods have been confirmed to be outstanding, and better than traditional approaches in many different aspects such as preciseness and reduced resource challenging [14].

Primary algorithm of meta-learning LSTM
The learning algorithm will include an input, which has an aim to perform training via a training set referred to as Dtrain = [(Xt, Yt)]. Also, it has an output with a parameter θ, which will model the learner (AT), the aim of the learning algorithm to gain a robust performance performed on the test set Dtest= (X, Y). Table 1 displays the primary algorithm of the meta-learning LSTM.

Train of meta-learning LSTM algorithm
The meta-learning LSTM algorithm will include an input, which has an aim to perform training and testing via a meta-validation training set referred to as D meta-validation train-LTSM train=[( train , test )]=1; also it has an output with a parameter φ, which will model the meta-learning LSTM algorithm, this process aims to gain a robust performance via the set of (meta-test) D Meta LSTM test= (Dtrain, Dtest). Table 2 presents the training algorithm of the meta-learning LSTM.

Feature extraction and labeling
Feature extraction depicts the affined shape details that comprised in recognition of a pattern, the concept of feature extraction considered as discriminatory kind of dimensionality reduction, where the purpose of the extraction process is to attain the significant pertinent data from the source of that data to appoint them in a minimized dimensionality. Referring to time or circumstance, the input data is so huge to be handled based on the followed set of rules in an algorithm, so input data should be metamorphosed into a minimized form of features. The process of metamorphosing input data into a group of features can be defined as feature extraction. The extraction process involves capturing essential details and distinctive characteristics from raw data, where each feature is exemplified through a feature vector that turns into its identity [15]. Table 3 shows the extraction and labeling algorithm of the meta-learning LSTM.

MapReduce framework
MapReduce model contains two tasks; Map-task and reduce task. MapReduce model is deduced from the function amalgamation of a map and reduces; this model is exceedingly utilized to process an enormous dataset. MapReduce uses the set of (key/value) as a data category [16]. A representation of the MapReduce framework is displayed in Figure 2. The major stages of the MapReduce platform are Mapper, Reducer, and a middle phase known as shuffle, all of them are introduced in the section: Mapper function: handles input data and carry out a few calculations on that input then generates intermediate outcomes with the arrangement of (key/value) [17]. Reducer function: a group of key values and averaged key, it mingles all values with each other to create values of a lower set [18]. Shuffle phase: within the MapReduce platform, after the task of the Map, ordinarily considerable volumes of middle data need to be shifted to Reduce operation from the Map operation, the shuffle moves the available data from the Mapper to the Reducer phase, after being arranged via the keys. Thus the whole pairs with an identical key shall be organized and grouped with each other and then transferred [19]. MapReduce's framework carries out those functions side by side efficiently, even in many devices [20]. The meta-LSTM could be considered as past learning of the semantic structure, and the pivotal LSTM is the subsequent knowledge. Consequently, the learned meta-LSTM offers a competent method of implementing transfer learning.  Table 3. Extraction and labeling algorithm extract_features imge = loadImage("*.jpeg") loadPixels(); //pixel looping and then perform looping through all columns and rows for (int Column = 0; Column < width; Column ++) // Loop over pixel column for (int Row = 0; Row < height; Row++) // Loop over pixel row for (int pix = 0; pix < pixels. length; pix++) // acquiring pixel value with range 0 to 255 color = grayscale pixels [pix] = color based on range [0-255] //update pixels image_size= (image-width, image-height), sample-size = batch-size, Data = Dataset batches data-batch = LOAD map-data data batch-size = 128 if data size > specified data sample then data-batch = split-batch(reduce) k = 0 for (inputs-sample, labels-

RESULTS AND ANALYSIS DISCUSSION
The essential considered factors of quantitative image analysis are processing and analysis. Among the challenges that will face any researcher, are software and hardware limitations. During our dataset processing and inspection, we encountered these kinds of restrictions. And we were able to overcome them, as we are dealing with the Bd, we cannot depend on the regular computer hardware, so we used graphical processing units (GPUs); they are a hardware appliance that is most effective for parallel and rapid processing. GPUs provide DL with the ability to perform separated computations from the central processor (that is serial tasks dedicated) and adequately fulfilling complex computations. In our research, we will use the GPU to be eligible to process a big data set of Saturn images that contain more than 400,000 images. To initiate the process of training, we indiscriminately sampled 50K images from the adopted dataset volumes. Every image is correlated to one or more categories, arranged in 6 observable denominations containing as represented in Figure 4. From left to right: Saturn, Rings, Titan, Icy Satellites, Small Satellites (rocks), Sky. The label selection method is based on each image content, as it is delineated to an interpretative word just as Cassini teams adopted. Figure 4 provided a screenshot of the classification process adopted, which shows the image with its target with correct classification accuracy, even with not existing before images.
The proposed algorithm is capable of processing at high speed. Also, the effectiveness of the adopted approach to transfer learning is noticed. Table 4 shows the adopted classes and the number of images utilized in the primary processes of training and testing.  The framework of meta-learning could be executed to any technique which is trained via the scheme of meta-learning. The purpose is imparting an inclusive approach that can readily be ordered to achieve the best or the desired performance with any required task. The analyses of the cell of tensors are straightforward embedding via the layer of map-reducer discriminator (D), which is acting as a pooling layer that reduces the mapped features. At the same time, the meta-learning LSTM will deconstruct each generated tensor into four cell tensors: (input tensor cell), (forget tensor cell), (cell state tensor cell), and (output tensor cell). Our model complexity is determined by the revelation of Levin complexity definition [21]: The problem that needs to be solved is represented by P, while l(p) is the program p length, and t(p) represents the time that is consumed by p to solve P. Transferring knowledge acquired from a single task with the abundance of labeled data to some other tasks with slight labeled data, the level of progression of performing its mission relies on how pertinent is the former task of big-scale image recognition to the current task [22]. In the situation of meta-learning LSTMs, with the epilogue of each task, the experience is gained and kept in the memory of the LSTM cell. To confirm the proposed model efficiency, this model is set to weigh with other commonly known methods of image classification. Our conducted experiment results are demonstrated in Table 5.
It is clear by the evidence that support vector machine (SVM) and random forest models notably have less accuracy than the other models. The justification of this lies in the emphatic feature extraction performed by our provided model. Also, our model adopts DL within its optimal optimization and parameter initialization. Figure 5 shows the diagram of training accuracy versus validation accuracy over the number of epochs. Within the loss plot, it is clear that the model holds a comparable efficiency on the training data and validation data. The confusion matrix for calculating the overall classifier accuracy is shown in Figure 6, which is evaluated using the 50 thousand images dataset. The acquired percentage is 96.7%.

CONCLUSION AND FUTURE WORK
The image classification process is a complex also time-effort draining operation, and cause extreme physical or mental fatigue; it needs further space and time to complete a single task, also additional effort to express and extract features from images or to create a neural network to classify images, especially when the size of the dataset is huge, single device capabilities will not be enough to comply with space or time exigencies to perform image classification. The parallel analysis of the three modules: the features extractor (G), the meta-learner LSTM (M), and map-reducer discriminator (D): all together demonstrates a robust, optimized neural network framework that has shown ameliorated training speed, and here we conclude that using potent image data analyzing framework, offers the ability to process big data with more precise classification results. The adopted framework acquired a classification precision of 96.7%. The subsequent phase for this work can be to add another layer of classification to our model, such as convolutional neural networks (CNNs), to acquire a higher classification accuracy.