Iterative improved learning algorithm for petrographic image classification accuracy enhancement

ABSTRACT


INTRODUCTION
As interdisciplinary applications, image processing and pattern classification techniques have been used for rock type identification. Most of this work is reported on handheld rock specimens [1], [2]. Studies of rock microstructures and applications of texture analysis-based approaches have been rare. This is mostly due to the stochastic nature of texture formed by minerals.
In this paper, microstructure image classification of Igneous rocks found in Volcanic and Plutonic subclasses are reported. Volcanic Igneous rocks are formed due to rapid cooling of lava on the Earth's surface. The fast cooling ensures that mineral grains do not grow to a large size. Plutonic rocks are formed below the Earth's crust due to slow cooling of the magma. Since the process of cooling is slow, the minerals tend to grow larger [3].
The basis for classification is the mineral grain size. The textures are analyzed using Haralick features [4] and a Laws' Mask-based [5] approach. Support Vector Machines (SVM) are used for classification. SVM based classification is successfully used in many applications ranging from medical applications such as MRI Brain image classification [6] to archeological applications such as monument classification [7]. For a Support Vector Machine, proper tuning its controlling parameters is important. Genetic Algorithm (GA) based approach is used to avoid trapping of SVM into local minimum [8], [9]. Rather than following such heuristic search based complex approaches, for the classification purpose in the chosen research work, A Radial Basis Function Support Vector Machine (RBFSVM) classifier is used. The controlling parameters C and Sigma are optimized using Grid search [10]. In addition, a classifier combination-based AdaBoost algorithm approach is used for classification.Boosting algorithm integrates weak classifiers. It has strong applicability for small size and high dimensionality data. It is easily programmable and highly adoptable [11].
The percentage accuracy for classification is calculated after each algorithm is implemented on the chosen image database. A progressive improvement is seen when an elaborate classification approach such as AdaBoost is used, after starting from a conventional RBFSVM approach. When the correctly classified images are added to the training image set, the 'Improved Learning' results in an improvement in % Accuracy.
The following sections provide the detais of the research work. Section 2 provides details of the database of images used. In Section 3, details of Haralick features and Laws' Mask details are provided, and the process of selection of the feature pair is presented. In Section 4, the concept of Improved Learning (IL) is detailed. Section 5 describes the implementation using a flow chart and outlines the execution of the algorithms. Section 6 provides the details of results obtained. Finally, Section 7 consolidates the various conclusions drawn from the IL-based approach.

DATABASE OF IMAGES USED
In this study, locally relevant Igneous rock microstructure samples were used. Images belonging to the Basalt, Andesite, Spherulite, Pseudotachylites, Pegmatites, Dolerite, Rhyolite subfamilies were chosen. These varieties are abundant in the Western and Northern parts of the country in areas such as Goa, Maharashtra, Gujrat, Rajasthan, the Arawali Mountains, and Kumaun [12], [13], [14], [15], [16]. The structural analysis of these subfamilies is vital for construction and irrigation purposes. A total of 128 images, 64 each from the Volcanic and Plutonic categories, with a fair representation of each sub-family mentioned above, was considered.

FEATURE SELECTION
Haralick Features and Laws' Masks were used to analyze the stochastic textures examined here due to diverse mineral combinations at the microstructure level. The Haralick features considered were a. Contrast b. Energy c. Entropy d. Homogeneity e. Correlation And the Laws' features considered were a. Absolute Mean b. Standard Deviation To improve the likelihood of generalization, a small set of features, usually from the original input variables, is generated via feature selection. In the feature selection step, redundant or meaningless features are discarded to achieve higher generalization performance and faster classification compared to the initial set of features.
The approach starts from an empty set, and features are added continuously while progressively checking the classification method's performance using a suitable classifier. This approach is called the forward selection approach [17].
A Radial Basis Function Support Vector Machine Classifier is used for classification by implementing the Forward Selection approach. Using all seven features, every possible feature pair was created, and the classification performance was evaluated.
Thirty-two Volcanic rock images and 32 Plutonic rock images were selected at random using the MATLAB function randperm(n) for each trial. One thousand such trials were carried out, and the Average False Rejection Ratio (AFRR) was calculated. The AFRR for all 21 duplets is as shown in Figure 1. The best AFRR of 0.125 was reported for Contrast and Abs. Mean (Feature Pair-1, 6). Consequently, triplets were formed with Pair (1, 6) as a base, adding a third feature and then checking AFRR. The performances of triplets, quadruplets are shown in Figure 2. Because there is no improvement in the False Rejection Ratio, compared to the cost of added computation time, the best performance is selected using the Contrast and Laws AM feature pair; this pair is then selected for improved learning-based experimentation.

IMPROVED LEARNING
Achieving the desired balance between stability and plasticity attributes in a classifier is critical. A classifier requires plasticity for the integration of any new knowledge, but it also requires stability to prevent the loss of previous knowledge [18]. Support Vector Machines are stable classifiers that exhibit better classification performance than many other machine learning methods [19]. However, stable SVMs suffer from a lack of plasticity and are prone to the catastrophic forgetting phenomenon. Catastrophic forgetting is defined as an event that occurs after a classifier is trained on the first task and then on another task, which results in the classifier forgetting to perform the first task.
Therefore, to fully benefit from the SVM Classifier performance, an Improved Learning method needs to be applied to the SVM, to retain its stability, while implementing plasticity [20].
A classifier can be eligible for improved learning by satisfying the following criteria: a.
It should be able to learn additional information from the input data. b.
It should not require access to the original data used to train the existing classifier c.
It should preserve previously acquired knowledge i.e., it should not succumb to catastrophic forgetting [21]

IMPLEMENTATION
For training purposes, a holdout-based training approach was used [22]. 64 Volcanic Rock images and 64 Plutonic Rock images, 32 from each class, were randomly selected from the image database. Overall, 10000 such training set combinations were attempted. The training set that gives the highest % Accuracy was considered. The Algorithm is as follows: a. Compute Percentage Accuracy for Radial Basis Function SVM, Optimized Radial Basis SVM and AdaBoost Classifier with SVM as the base classifier. Identify the misclassified images in each case. b. Add the classified images to the Training set to improve its 'Learning'. c. Use the revised Training set and classify the remaining images, misclassified in an earlier iteration, using the 'Improved and more intelligent' training dataset.  Table 1 precented about Algorithm-SVM, Table 2 precented Algorithm-Optimized SVM, Table 3 precented Algorithm-ADABOOST. The Improvement in percentage Accuracy for each algorithm is due to the Improved Learning approach used. Figure 4, highlights the details    Misclassification is attributed to the overlapping region shown in Figure 5. The misclassified images, even after applying the Improved Learning Based AdaBoost algorithm (Namely-Volcanic No. 13; Plutonic No. 29, 30, 40) demonstrated 'outlier' attributes. The outliers exhibit interim 'Hypabyssal' group feature attributes [23] and land in the overlapping region. Hence, these findings are consistent with the Geological properties. A summary of research approaches used for Petrographic Image Analysis is shown in Table 4. It is seen that authors of this paper report the first of its kind use of Support Vector Machine based method and Classifier combination methods are practiced for Petrographic Image Classification. The Accuracy reported is around 94%. Majority of the authors have tried to classify Igneous, Metamorphic and Sedimentary rocks. The present paper is a unique research activity of classification of igneous rocks into its subfamilies. This study was carried out on an Intel i3 processor working at 2.10 GHz and used MATLAB version R 2012 b. A total 26.2 seconds was required to execute the IILA program. IILA had 2 iterations, as reported.

CONCLUSION
It can be concluded that the Classification of Igneous Rocks into Volcanic and Plutonic subfamilies can be carried out using Support Vector Machine based approach and classifier combinations. Fine tuning the penalty parameter and variance for the Radial Basis Function SVM, optimal performance can be extracted from the SVM Classifier. By using the optimized SVM as the base i.e. weak classifier, using Adaptive Boosting enhancement in classification accuracy can be achieved. Further improvement in classification accuracy can be achieved through the use of iterative Improved Learning approach. The iterative learning provided information about images those remain misclassified despite iteration. This information can be used for validation of outliers.