Ant System and Weighted Voting Method for Multiple Classifier Systems

ABSTRACT


INTRODUCTION
Classification is an important function in data mining. One of the main issues in performing classification is to identify the classifier in order to obtain good classification accuracy. The use of a single classifier provides minimal exploitation of complementary information from other classifiers, while the combination of multiple classifiers may provide such additional information [1]. The goal of multiple classifier combination is to obtain a comprehensive result by combining the outputs of several individual classifiers [2]. This consists of a set of classifiers called classifier ensemble and a combination strategy for integrating classifier outputs called combiner.
Multiple classifier combination has been widely used in many application domains such as: speech recognition [3], human emotion recognition [4], video classification [5], face recognition [6], email classification [7], cancer classification [8], plant leaf identification [9], concept drift identification [10] and sukuk rating prediction [11]. Multiple classifier combination has been very useful in enhancing the performance of classification. However, there are two problems in developing multiple classifier combinations: constructing the classifier ensemble; and, constructing the combiner. There are no standard guidelines concerning how to construct a set of diverse and accurate classifiers and how to combine the classifier outputs [12]. Most previous studies focus on classifier ensemble construction and apply a simple fixed combiner to combine the outputs [13]. This study focused on both problems and reviews were performed on feature set partitioning and weighted voting combiner.
There are several approaches to construct a classifier ensemble. All such approaches attempt to generate diversity by creating classifiers that make errors on different patterns, thus they can be combined effectively. The diversity among classifiers in ensemble is deemed to be a key success factor when to constructing classifier ensemble. Theoretically and empirically, it has been shown that a good ensemble has both accuracy and diversity [14]. One of the approaches used to construct a classifier ensemble is the feature decomposition method which manipulates input features in constructing a diverse classifier ensemble. This method decomposes the input features while training the classifier ensemble. Therefore this method is appropriate for high dimensionality data sets [15].
One of the cases of feature decomposition is feature set partitioning. Input features are randomly partitioned to several disjointed subsets. Consequently, each classifier is trained on different subsets. Feature set partitioning is appropriate for classification tasks containing large number of features [16], [17]. However, it is difficult to determine how to form optimal feature set partition to train classifiers to produce good performance.
Reviews of the set partitioning problem highlight that the ant system, which is a variant of ant colony optimization (ACO), is the most promising technique to be applied [18]. The ACO algorithm was introduced by Marco Dorigo in the early 1990s. This algorithm is inspired by the behavior of ants in finding the shortest path from the colony to the food; in order to find the shortest route they leave a pheromone on their tour paths. The ant-based algorithm has shown better performance than other popular heuristics such as simulated annealing and genetic algorithms [19]. The ant system (AS) algorithm is a variant of the ant basedalgorithm. This is an original and most used ant-based algorithm in solving many optimization problems [20]. The ant system has also been used to solve the set partitioning problem. Set partitioning problems are difficult and very complicated combinatorial issues [21]. The use of ant system for set partitioning problem has been applied in constructing a classifier ensemble [22].
The most popular, fundamental and straightforward combiner is majority voting [23]. Every individual classifier votes for one class label. The class label that most frequently appears in the output of individual classifiers is the final output. To avoid the draw problem, the number of classifiers performed for voting is usually odd. Majority voting is often used to combine multiple classifiers in order to solve classification problems [24]. Previously popular ensemble methods such as bagging, boosting and random forest have used majority voting in combining classifier outputs. The advantages of majority voting include simplicity and lower computational cost. Majority voting enables combination of the output of classifiers regardless of what classifier is used. It is an optimal combiner in several ensemble methods [25]. However, the disadvantage of this combiner is that it does not consider the strength of the classifier [26].
Weighted voting is a trainable version of majority voting which, unlike majority voting, gives weight to each classifier before voting. To make an overall prediction, a weighted vote of the classifier predictions is performed to predict the class. There are several ways to determine the weight of classifiers [27]. The advantages of weighted voting include its flexibility and the potential to produce better performances than majority voting. This combiner has the potential to make multiple classifier combinations more robust to the choice of the number of individual classifiers [28]. In addition the accuracies of the classifiers can be reliably estimated, after which weighted voting may be considered [29]. Several studies have concentrated on weighted voting and have been proven to solve real-world problems such as face and voice recognition [30] and listed companies' financial distress prediction [31]. Therefore, in this study the weighted voting combiner is adapted as a combiner which considers the performance of each classifier.

RESEARCH METHOD
There are three steps to the research work: (1) classifier ensemble construction; (2) combiner construction; and (3) evaluation. In developing the multiple classifier system, effective combination must address the first two steps of ensemble construction and combiner construction. The ant system feature set partitioning algorithm is applied to construct classifier ensemble, while the weighted voting technique is applied as a combiner. Figure 1 shows the architecture of the proposed method which consists of two components namely the ant system feature set partitioning and the weighted voting combiner.

Classifier Ensemble Construction
The classifier ensemble is built based on the feature set partitioning algorithm. A disjoint feature set partition is carried out based on the input feature set. An algorithm based on ant system is developed to perform feature set partitioning. The number of feature partitions is determined by the number of individual classifiers. The required inputs include feature set and category labels of the original data set. The input feature set is partitioned into different feature subsets and no feature in the training set is removed. Therefore, each individual classifier is trained on a different projection of the training set. The flowchart for feature decomposition is depicted in Figure 2.

Combiner Construction
In this construction stage, the weighted voting method is used as the combiner. A learning process for each classifier on different partitions of features is performed by the ant system algorithm. Weights are given according to the performance of each classifier. The performance of each classifier depends on the feature set partition. Therefore, the voting weights of each classifier are updated dynamically based on the feature set partition. The idea behind this approach is that the classifier which is trained by different feature set partitions will provide different accuracies although one type of classifier is used in the ensemble. Classifiers that provide a high accuracy are more likely to classify patterns correctly. Let = { 1 , … , } be a set of individual classifiers (or an ensemble of classifiers) where is the number of individual classifiers. Let = { 1 , 2 , 3 , … , } be a set of class labels where c is the number of classes. Let = { , } be a training set (a labelled dataset) where = 1 … , is the number of instances, ∈ ℜ is the dimensional feature vector of i-th instance and ∈ { 1 , … , } is the class label of the i-th instance. Each classifier assigns an input feature vector to one of the predefined class labels, i.e., : ℜ → . The output of a classifier ensemble is an dimensional class label vector [ 1 ( ), … , ( )] . The task is to combine of individual classifier outputs to predict the class label from a set of possible class labels that make the best classification of the unknown pattern.
In formulating the weighted voting combiner, let us assume that only the class labels are available from the classifier outputs, and define the decision of the j-th classifier as d_(j,k)∈{0,1}, j=1,…,L and k=1,…,C, where L is the number of classifiers and C is the number of classes. If j-th classifier D_j chooses class ω_k, then d_(j,k)=1 and 0 otherwise. The ensemble decision for the proposed weighted voting can be described as follows: choose class ω_(k*) if where is the accuracy (or weight) of classifier . The votes are multiplied by a weight before the actual voting. The weight is obtained by estimating the classification accuracy on a validation set.

Evaluation
In this step, the performance of multiple classifiers constructed by the proposed ant system and weighted voting (ASWV) method is measured and compared with several other ensemble methods. Experiments were conducted on 9 (nine) benchmark datasets taken from the University California, Irvine (UCI) repository. The k-Nearest Neighbour (k-NN) ensemble has also been used in the experiments. Table 1 shows a summary of the datasets used in the experiments. The k-fold cross-validation method was applied in the process of obtaining the classification accuracy [32]. A set of labeled samples are randomly partitioned into k disjoint folds of equal size. Then, one of the k folds is randomly selected as the testing set and the remaining (k-1) folds are selected as the training set with the assumption that there is at least one sample per class. The classification accuracy (acc) is the ratio of numbers of all correctly classified instances and the total number of instances as shown in Equation 2.

= . * 100%
(2) Finally, the estimation of classification accuracy is obtained by dividing the total of all classification accuracies by the total number of folds or rounds as shown in Equation 3.
is the classification accuracy of round i and k is the number of folds. A common choice for kfold cross validation is k=10. Extensive experiments have shown that 10 (ten) is the best choice to get an accurate estimate [33]- [35]. To obtain powerful performance estimation and comparisons, a large number of estimates are always preferred. Therefore, in this research, the experiments are conducted on ten times the 10-fold cross-validation method.

RESULTS AND ANALYSIS
The ant system algorithm was used to partition the feature set and weighted voting was used to combine classifier outputs. Experiments were carried out on nine (9) data sets from the UCI repository. Ten (10) experiments which consist of 10-fold cross validation method were carried out to validate the accuracy of single k-NN and constructed k-NN ensembles. Tables 2 shows the average and standard deviation of the classification accuracies of single k-NN, constructed k-NN ensembles based on random subspace and constructed k-NN ensembles with the used of ant system-based feature set partitioning respectively. It can be shown that a small standard deviation was obtained for all method which indicates the experiments were stable. The average accuracy of the constructed multiple k-NN by the proposed method was compared with the average accuracies of original single k-NN and constructed k-NN ensembles by the random subspace method. It can also be seen that the proposed method provides better accuracy than single approach and random space method in constructing k-NN ensembles. Improvements in accuracy are obtained on all datasets. The comparison of accuracies is as shown in Table 2. The proposed algorithm was successfully applied to form feature set partition. Table 3 shows the summary of the result of implementing this proposed algorithm. This table presents the feature set partition and the number of classifiers. The accuracy of the proposed method was also compared to the other common methods as shown in Table 4. The accuracy of the proposed method was evaluated by comparing the results to: (1) Single  [28], (3) improved k-NN classification using genetic algorithm (GA k-NN) [36], (4) simultaneous metaheuristic feature selection (SMFS) [37], (5) weighted k-NN ensemble method [27], (6) direct boosting algorithm [38], (7) cluster-oriented ensemble classifier (COEC) [39] and (8) evidential neural network [40]. The k-NN classifier was used as the base classifier. Based on the results, it can be seen that the proposed method gives the best classification accuracies as compared to the other methods on habermann and breast cancer dataset. In general, the proposed method gives good classification results and is comparable with other methods.

CONCLUSION
A new method based on the integration of the ant system and weighted voting for multiple classifier systems has been presented. The ant system was applied to optimize the feature set partition activity while weighted voting was used as a combiner. Experiment results show that the application of this method in combining several k-NNs as base classifier outperforms single k-NN, comparable with other ensemble methods. The results indicate that the proposed method can be applied in generating better k-NN ensembles. Furthermore, this method can determine the number of the combined classifiers based on the number of formed partitions.
Future research is to apply this method on other classifiers such as the Support Vector Machine, Neural Network and Decision Tree. The dynamic feature partition-selection approach can be considered to enhance the performance of this method. The method will, hopefully, be able to partition the feature set into several lower-dimensional feature sets, which would allow a set of classifiers to process low dimensional feature vectors simultaneously. Therefore, testing the ability of this method to overcome the high dimensional data and small training sample problems can be considered.