A new feature extraction approach based on non linear source separation

A new feature extraction approach is proposed in this paper to improve the classification performance in remotely sensed data. The proposed method is based on a primary sources subset (PSS) obtained by nonlinear transform that provides lower space for land pattern recognition. First, the underlying sources are approximated using multilayer neural networks. Given that, Bayesian inferences update unknown sources’ knowledge and model parameters with information’s data. Then, a source dimension minimizing technique is adopted to provide more efficient land cover description. The support vector machine (SVM) scheme is developed by using feature extraction. The experimental results on real multispectral imagery demonstrates that the proposed approach ensures efficient feature extraction by using several descriptors for texture identification and multiscale analysis. In a pixel based approach, the reduced PSS space improved the overall classification accuracy by 13% and reaches 82%. Using texture and multi resolution descriptors, the overall accuracy is 75.87% for the original observations, while using the reduced source space the overall accuracy reaches 81.67% when using jointly wavelet and Gabor transform and 86.67% when using Gabor transform. Thus, the source space enhanced the feature extraction process and allow more land use discrimination than the multispectral observations.


INTRODUCTION
Providing suitable presentation is a challenging but important task for remote sensing and earth observation applications. Various data processing fields of remote sensing imagery have received an important attention due to the increasing development of multichannel sensors. Feature extraction approaches for multispectral and remotely data classsifcation is within the principal approaches. The Table 1 resume the feature extraction approaches for remote sensed image classification.
Existant works are based on spectral and spatial metrics [1]- [4], handcrafted feature extraction [1], [5] and semantic deep feature extraction [6]- [8]. Combining various feature extraction models has also been investigated to emprove the data representation efficieny [1], [4], [8]. In contrast to convenctional methods that emply spectral or spatial data representation for land cover identification, a significant advantage of source separartion is to allow a new representation for the orogonal data with less correlated component. The ability of source separation was not been investigated as feature extraction approach. In thid context, this work aims to study the source separation contribution in feature extraction for land cover classification. Table 1. Features extraction approaches for land cover classification Spatial metrics and Texture measures for land cover objects classification [1] Parcels geometrical attributes including shape, height, proximity to major roads, similarity to neighbors [2] Spectral indices (NDVI, MNDWI, NDBI) [3] Object-based feature extraction based on spatial and spectral statistics [4] High dimensional feature vector combining focal textures statistics (median, mean, and standard deviation) and Gray Level Co-occurrence Matrix derived in different kernel sizes [5] Semantic features extraction using different deep convolutional neural networks models [6] High-level semantic features extraction based on transfer learning and the Inception-ResNet-v2 model [7] Deep semantic feature extraction using different models (VGG-S , VGG-M, VGG-F, VGG-VD16, VGG-VD16) [8] Combining deep semantic features, spectral features and GLCM texture features [9] Vision-based technology have been widely used for object of interest detection. For instance, in [10] the detecting task is based on histogram equalization and morphological processings. The method aimed to detect, classify and track road distresses. The proposed method reults include false alarms and require enhancent. To recognize hand motion, the method proposed in [11] is utilized to search region of interest, and applying the standard particle filter for motion recognition. In this research, we aim to use the source separation as a feature extraction method and to use the new retrieved sources for the classification task. Using vision-based technology can be based on the source space and can be compared to the results obtained when using the initial representation. Moreover, although machine learning and deep learning have simplified the classification task complexity, source separation may provide a new initial space for the learning phase and can be more accurate than initial observation.
Source separation aims to recover revealing hidden factors that generate measured signals. In this framework, the blind source separation process is of considerable importance for such dataset [12]. Considering linear mixture, many authors have proposed interesting works based on BSS with considering of different assumptions and simplifications. Linear mixture models have interested many researches. They assume the pixel as a linear mixture of the radiances reflected by the end-members. Source's separation criteria represent the key assumption in the leaded separation algorithm. Some method are are based based on misfit function called contrast other methods are based on high static range. Other types of methods are based on algebraic criteria for second order statistics or non stationary source criteria. Another methods category is based on source time-frequency diversity and sparcity. The principal component analysis (PCA) is based on creating uncorrelated variables that maximize the variance. Independent component analysis (ICA) assumes that the new variables are non-gaussian and are statistically independent [13], [14]. However, many reflections occur to the radiances emitted by the soil which represent a mixture of materials. The mixture is therefore nonlinear. The non-linear BSS method aims to determine the original sources [15]- [17]. The main contributions of this paper are: − Using nonlinear source separation for feature extraction: At our knowledge there is not existing approaches that have used the source separation for feature extraction. Majority of works use spectral data or feature extraction based on the observation data − A quality guided dimension reduction based on the classification efficiency. The primary source are the most representative for the landcover categories.
In the next part, we will detail the methodology for multispectral image feature extraction. The feature extraction method based on nonlinear source separation and a qualitative dimension reduction will be detailed. We will present also the feature vectors generation and the SVM classifier. Experimentations results are provided and discussed in section 3. The paper ends with conclusion and future work.

RESEARCH METHOD
In order to build a feature extraction method for land pattern for remote sensing images, we propose the following steps: The remote sensing classification method is detailed in Figure 1. Remote sensed collected signals may cause many manifolds due to nonlinear mixture, so the nonlinear source separation is needed to provide uncorrelated sources. Then we will reduce the source number using a qualitative criterion. After that, we will study the primary source contribution in a pixelwise classification and a patchwise classification. We will generate the learning data set containing homogeneous region for selected land cover thematic classes. The patchwise classification is based on learning classification.

Feature extraction based on nonlinear separation
The aim of this stage is to provide an efficient and compact presentation for pattern recognition. The process starts by a non linear separation that provides uncorrelated sources and approximated the nonlinear mixing phenomenon. The source set will be reduced to obtain a subset that presents efficiently the land cover. Then pattern transformations will generate feature vectors. The next parts will detail the feature extraction processes.

Non linear blind source separation based on bayesian inferenences
In our model, the separation is based on the nonlinear counterpart of PCA [16] to approximate the real mixture phenomenon. The mapping function contains a noise factor and the sources are gaussian mixture. The separation is performed by an MLP where the gradient descent tends to reduce the mutual function [17], [18]. Bayesian inferences estimate the mapping parameters from their priors. In (1) [19] represents the mapping function where X(t) is a memoryless mixture of the sources S(t) which are statistically independent. Estimated sources Y(t) obtained by the separation model F as presented in (2).
The hidden layer activation function is the hyperbolic tangent [19], [20]. The unknown parameters are approximated iteratively using ensemble learning. Nonlinear mapping is presented in (3)  parameters, denoted  [21], have normal distribution ( ̅ ,̌) where  is the mean and  is the logarithm standard deviation. The posterior probability density (PDF) of unknwn paramteres are approximated [22] during the leaning by minimizing the Kullback-Leibler (KL) divergence between current PDFs p(θ|X) and their approximation q(θ|X). The factorial form presented in (4) is obtained thanks to the a-priori independency assumption. The distributions misfit, measured with the KL divergence, is given by (5) [23]. We add the term − log( ( )) = −∫ ( , ) to avoid the model evidence term calculation ( ( )). The divergence measure becomes as detailed in (6) [21]. The resulting mapping of the sources to the observations is presented in (7). As shown in (8) and (9) detail th unknown parameters distributions. Ones sources are approximated, the reduction algorithm performed to the source set is detailed in the next subsection.

Source space dimension reduction and exemplars base generation
The dimension reduction process is an indispensable part for compact data presentation, data visualization or data classification preprocessing. The dimension reduction must preserve the required information existing in the initial signals [23]. Many methods have been developed [24]- [27]. They include data projections and transformations [28]- [30]. PCA method provides linear approximations for given data by encoding second order dependencies and finding the directions of maximal variance. Kernel PCA computes the principle components in the space produced by non linear mapping [31]. ICA method searches for independent components by rotations. Locally linear embedding [32] approach addresses the non linear dimensionality reduction. These mentioned methods lie to the feature extraction category that projects original data onto lower dimensional subspace. Other way for dimension reduction is feature selection and is based on finding a suitable subset form original data.
In this work, we present a dimension reduction method that refers to feature selection in the new presentation space of sources. The aim of the qualitative approach is to reduce the number of sources by distinguishing those describing better the land covers. The reason that leads the source space dimension reduction is the observations misclassifications due to band correlation, pixels scene heterogeneity and signal distortions by the atmosphere. Starting from uncorrelated sources obtained by the nonlinear separation, we aim to identify "primary" sources that ensure the best classification accuracy. An iterative process generates all source combination. Then for a well known land cover samples, the supervised classification will be performed for all source combination set. The best classification accuracy among all sources set determinates the PSS. This combination gives therefore the better description for the land covers. Sources that not appear in the SPP are called secondary sources subset (SSS). Those can be interpreted as hyper-information and can be used for further work. Compared to observation space, PSS space should give better classification accuracy and should describe more efficiently the land covers which will be proven in experimentation. Let consider the case of 4 observations. There are 4 obtained sources given by the Source Space={S1, S2, S3, S4}.  (10). PSS are determined by (11) where C denotes the source combination set.
The correspondent dimension reduction can be described as described in Algorihm 1. The notation Comp(Y, X) where ⊂ denotes the complement of the subset Y in the set X. The reduction algorithm is performed by nested loops. Accu denotes the classification accuracy for a given image set. Accu(.) is a function that returns the accuracy of an image set throw a supervised classification.
is the combination set generated for i sources and providing j-combination sets. M is the source number. Acccurr is the current accuracy computed form the current source combination in the iterative process. Initiation is based on considering that all sources are primary PSS={S1, S2, S3, S4} and the maximum accuracy MaxAcc is given by Accu( ). The iterative process generates every primary source combination and computes the current accuracy Accu curr given by Accu( ). If the Accu curr is greater than MaxAcc, the MaxAcc will be updated by the value of Accucurr, the current set will be affected to the PSS, the SSS is the correspondent complementary in the source set.

Classification method 2.3.1. Feature transformations
Image classification is a challenging problem in remote sensing applications namely agricultural researches, inundation and fire detection [33], [34]. Traditional classifiers are inefficient due to the correlation between observation channels. Moreover, observations are mixture of the original data. Thus, the non linear separation pre-processing is primordial for the classification task. Obtained approximated and decorrelated sources present better data representation compared to the original observations. Classificatin process may take advantages from the obtained sources. A deeper analysis of the obtained sources will allow a narrow relation between the source presentation and land pattern. PSS have lower dimension which reduce the classification complexity. In this work, we considerate macro patterns that present a set of land category such as urban areas, wetlands, parcels and lakes. The classification process is based on the new data presentations. Many descriptors were used for pattern recognition. The proposed fusion model concerns feature level which could improve the pattern recognition reliability [35]. In fact, feature fusion will improve the land categories classification process trough providing multi-feature representation. In our work, feature vector contains Gabor and Haar transformations [36]. Wavelets give a multi-description for regions in different scales and orientations which allows to represent the image singularities [37]. Low pass and high pass filters in each level of decomposition deconmposes the image into low and high frequencies subchannels [35] which allows both an image approximation and high and lower frequencies identification. Wavelets basic functions are ψj,k= 2 j/2 ψ(2 j t-k) where ψ is the mother wavelet. For texture characterization, Gabor filters act as a multichannel filters offeroffers multichannels filters in different orientation u and scale v as defined in (12) [38]. The frequently used values for u and v are The frequekty used values are v∊{0,1,2,3,4} and u∊{0,..,7}. For a given image I, the Gabor wavelet transform is given in (13) where * denotes the convolution operator [34].

The SVM classifier
Recent works were concentrated on developing machine learning methods for classification and recognition tasks. Support vector machines (SVM) method is used in this field and has proved his effectiveness by giving better classification accuracy than classic classifiers. SVM are widely used in land use and land cover classification and have the advantage of being particularly efficient for a huge feature space dimension. The classifier is based on genenerating a decision function defined by a set of learning instances [39]. The vector subset called support vectors (SV), determine a separating hyperplane separability. The support vectors lie on two hyperplanes that are parallel to the optimal hyperplane defined by w.v+b=0. w and b are constrained by | . + | = 1. Maximazing the hyperplans margin leads to an optimization problem under constraints as presnted in (14) [34] with ∑ = 0 =1 and 0 ≤ ≤ , where is a penalty factor. The optimal solution α*determinates the support vector set SV.
The optimal hyperplane is given by * = ∑ =1 and * = − When linear separability is impossible, data are projected in a feature space with greater dimension. Such transform to higher dimension Hilbert space H, can be linear or nonlinear. The kernel function K defined by (15) [37] is based on the mapping function Φ. There is no need to explicit the mapping function Φ. The radial basis function (RBF), presented in (16) is efficient for nonliear classification problem.

RESULTS AND DISCUSSION
This part aims to show the proposed method contribution in multispectral image feature extraction. The non linear separation is performed for simulated and real observations. The results of the proposed separation approach is compared to linear approaches. The proposed feature extraction method is evaluated for supervised classification application. Then, the PSS learning data set is presented and classified for different pattern transform. The classification part is performed to high resolution visible and infrared (HRVIR) observations. We use learning approach for pattern classification. The selected zone, located in north of Tunisia, presents an active scene that changes continuously due to its particular geographic situation. The image in Figure 2(a) shows various land coverts including scatter vegetation, lake, wetland, cropland, lake border and mountains. The scene is a SPOT-4 image that presents four channels. The spatial resolution is 20 × 20 . Images are 256 gray levels. Observation images are spatially correlated. The channels correlation presented in Figures 2(b), and 2(c) shows that major data samples are in the diagonal. Corelated data will misfit the classification separbility model and impact the land cover recognition processes.

Source separation for simulated mixture
In this section, we aim to study the proposed separation method on simulated mixing models. The initial sources and the joint distribution are presented in Figure 3 (17). The observations are presented in Figure 3(b) (see in Appendix). Separated sources given by JADE, SOBI and the proposed separation algorithms respectively in Figure 3(c), Figure 3(d) and Figure 3(e) (see in Appendix). The JADE sources correlations to initial sources are 0.77 and 0.99, the mutual sources correlation is 0.69. For SOBI separation, the sources correlation to initial sources is 0.79 and 0.76, the mutual sources correlation is 0.13. For non linear separation sources, the correlation to initial sources are 0.99 and 0.98 and the mutual sources correlation is 0. Non linear sources decorrelation enhances the classification accuracy. Regarding the scatter plots, the estimated sources by the nonlinear approach matches the initial sources more than linear approaches as shown in Figure 3(c) (see in Appendix).

Source separation method for real observations
The observations presented in Figure 4 are correlated; the correlation is 0.9 for Band 1 and Band 2 and is 0.87 for Band 3 and Band 4. The approximated sources are presented in Figure 5. The obtained correlation is 0.03. Therefore, source separation has the advantages of ameliorating the space representation by lowering the correlation awhich will enhance the feature extraction process. Visually, we notify distinguishable classes in the sources which represent contrsted regions such as the lake (in source 1), the urban zone (in source 2), and the wetkland (in source 3). The cost function presented in Figure 6 decreasess during the sparation process and became constatnt at after 170 iterations. The classification results are obtained by the minimum distance algorithm. Figure 7 presented the classification results for the land categories lake (in blue), Agricultural area (in green), lake (in blue), scattered vegetation (in yellow) and Wetkland (in cyan). The band classification illustrated in Figure 7(a) provides an accuracy equal to 69%. The sources classification illustrated in Figure 7(b) provides gives an accuracy equal to 67%. For the dimension reduction method, the best source combination is the {S2, S3}. The best accuracy for this combination is 82%. Using all sources for the classification has ameliorated the classification result compared to the band classification. Moreover, using only PSS ameliorated the classification precision and avoids many manifolds and misclassified pixels. Major ground cover classes such us wetland, lake, mountains and parcels are well identified in PSS space as presented in Figure 8.

Robustness to noise experiments
In this section, we aom to evaluate the proposed source separation method to two existing separation algorithms: SOBI and JADE. For this experimentation, white Gaussian noise contaminates the observations for different noise levels. The graph in Figure 9 demonstrates the evolution of correlation coefficients over recovered sources from noised observation. For every noise level varying between 0 and 30 db, the obtained sources are well correlated to initial sources then SOBI and JADE sources. Therefore, the proposed approach is robust in case of noisy observations. Ones we have validated the source separation and the source reduction methods, we will demonstrate their contribution in ameliorating the learning classification in the next paragraph. Figure 9. Correlation coefficient over source separation approaches

Learning data set
We generate, in this part, learning patches presenting the main land cover categories for the study area. The images are shown in Figure 10. Patches are 32x32 pixels. We use the PSS for the feature extraction. Figure 10. Learning data set samples lake wetland agriculture urban zone

Classification results
The learning set contains 405 patches and the test set contain 105 patches. Features are scaled to [0, 1] to optimize the classification results. SVM kernel used in this experimentation is the RBF that maps nonlinearly samples into higher dimension space and can handle by consequence the case when labels are mapped in a nonlinear way [40]. The classifier is parametrized by and g that represent respectively the error term and the kernel parameter. Best parameters values, 4 and 0,125 for respectively and g, are determined by cross-validation. The good identification rate is 81.67%. The band classification accuracy is 75.87%. Therefore, SPP features provide efficient presentation for original data and allow reliable classification rate. Moreover, the SPP space has fewer dimension which optimize the processing time and the error impact. In the next paragraph, we will discuss the feature vectors structure using experimentations results.

Pattern transform impact on feature extraction
To show the feature selection impact on classification accuracy, we will classify the SPP with Gabor features and Haar features in various ways. Figure 11 details respectively the classification rate by class and the total identification rate for each feature extraction case. For the total classification rate, using Haar features gives an accuracy equal to 38.09%. Using jointly Haar and Gabor features performs 81.67% of pattern identification. Using only Gabor features gives the best accuracy result equal to 86.67%. Compared to using both Gabor and Haar features versus solely Gabor features in the SPP space, the general classification accuracy is better in the second case. Lake and wetland are better recognized by Gabor and Haar descriptors, while mountains, parcels and urban zone are better identified with only Gabor features. These outcomes will enhance the ckassification results for instance, for study zone presenting large presence of water like lake and wetland, we use the Gabor and Haar features; for land with little presence of water, we use only Gabor features. Existant works for land cover and land use classification reaches an overall acuuracy higher than 75% [1]- [4] using handcrafted features. Using deep semantic feature extraction reaches an accuracy higher than 90% [5]- [9]. The advatanges of our proposed methofdology, in addition to provide accurate land cover classification, is providing non correlared reduced presentation. The new reduced feature space could replace the original data in further applications including segmentation, objet detection, mutlti-resolution representation and muti-date data analysis. These applications are specific for the land cover and land use data processing and have emerging interest currently.

CONCLUSION
The multispectral land cover classification process constitutes the major goal of this work. However, the land cover variability is one of most important factors which affect the remote sensing quality data and makes classification task challenging and needing special modeling analysis methods. In this context, we presented in this paper, a new feature extraction method based on non linear separation and a new dimension reduction approach. The performance of proposed approach was demonstrated by discriminative capabilities and data quality in the classification experimental results. The outcomes of this work are firstly, the adoption nonlinear source transform approximation through neural networks to model the real and complex mixing phenomenon of multispectral data providing decorrelation data. Secondly, the classification scheme based on PSS is generated by pattern transform using Gabor and Haar technique applying to remote sensing PSS data set to identify main class labels. Using different descriptors for texture identification, the proposed approach ensure efficient feature extraction and giving most perform classification accuracy comparing with classic classifiers. To profit from the dimension reduction, the proposed approach could be used on hyperspectral data which can provides very interesting results.