A new model for iris data set classification based on linear support vector machine parameter's optimization

Computer Science Department, AL Salam University College, Iraq Department of Computer Science, College of Education, Al-Iraqia University, Iraq Faculty of Computer Systems & Software Engineering, Universiti Malaysia Pahang, Malaysia Faculty of Computer Science & Information Technology, Universiti Tun Hussein Onn Malaysia, Malaysia Department of Electrical and Computer Engineering, Universitas Ahmad Dahlan, Indonesia


INTRODUCTION
Classification is a manner of data analysis which used to elicit a classifier to classify important data classes.These classifiers can expect categorical data (detached, unordered) class label [1].Also classification is an important field in the data mining and machine learning by concluding obscure classes of samples using the learning of renowned classes of samples [2][3].As an example, rating bank loan application can be classified as safety or risky after a construction of classification model.This analysis supplied us with a better comprehension of the data at big magnitude.Many classification approaches have been suggested in machine learning, pattern recognition, and statistics.Classification can be achieved in a process of two-steps.The construction of a classification based on preceding data is achieved in the first stage.In the second stage, specifies if the accuracy of a model is admissible, and if so, we utilize the model to classify fresh data [4].Support vector machine (SVM) Classifier is a renowned classification method employed for predicting the results of datasets [5].The proposed model was assessed on an IRIS dataset gained from the UCI Machine Learning Database [6].
The creation of SVM model with high prediction accuracy and consistency is based on seeking the ideal parameters on SVM, since it plays an essential role.Weakness classification performance result The NRGA compared to the conventional optimization mechanisms which operate on seeking the whole parameters together [7].b.A notation was submitted in [8] for determining SVM parameters depending on minds from design of experiences, which initiated with an extremely rough grid comprising the complete search range and repetitive revised both the grid resolution and search boarders, safeguarding the number of forms at each phase almost constants.c.Genetic algorithm (GA) is trend to be completely pretty at finding in general perfect universal solutions.
GA has been vastly adopted for parameter setting.In [9] a manner based on GA was suggested to simultaneously optimize SVM 'S parameters and attribute subset.In [10] GA is fused with asymptotic attitudes of SVM which then guides the search to the right line of perfect generalization error in the super parameter space.d.This study [9] develops a novel manner termed PSO+SVM.PSO based approach for parameter determining and feature selection, and then a comparison is conducted of gained results with other approaches.The SVM+PSO gained a better accuracy of classification than other tests.

CLASSIFIERS
Classification is imperative for data mining.The learning algorithm [11] establishes a classifier in a given set of measurement, for instance, a set of characteristic data (x1, x2,…., xn), where xi denotes feature data Xi.The purpose of classification is to initiate the actuality of groups when given a set of observation (unsupervised learning) or where various categories prevail and the target is classified into one of the previous categories (supervised learning) [12].Supervised learning has been employed in this study as the classification method.

SVM
In this part, we focus SVM, a manner using for a classification the linear and nonlinear data.The SVM algorithm operates as follows: the nonlinear mapping is used to convert the training data into a higher distance, under the fresh distance; it investigates for the linear perfect segregating hyperplane (i.e., a "decision boundary" segregating the tuples of one class from another).With a convenient nonlinear mapping to an adequately elevated distance, the data of two classes can be always segregated by a hyperplane.The SVM finds this hyperplane using support vectors ("essential" training tuples) and edges (defined by the support vectors) [13,14].

Genetic algorithm (GA)
Genetic algorithms (GA) operate with a collection of nominee solutions named a population.Depending on the Darwinian principle of "existence of the fittest", the GA earns the perfect solution after sequences of reduplicate calculations.GA products consecutive populations of alternate solutions that is representative by a chromosome, i.e. a solution to the problem, till acceptable results are earned.GA a general adaptive optimization search methodology based on a direct analogy to Darwinian natural selection and genetics in biological systems is a promising alternative to conventional heuristic methods.In this study, we essentially utilize GA to refine the parameters (C and γ) of the SVM model for iris dataset [15,16].GA as a wrapper method combined with PCA as filter method and tested using SVM to classification leaves [16].The results showed that GA combined with SVM given computing time effectively and improve accuracy.GA also used to select important features and instances then tested using SVM and k-nearest neighbors (KNN) [17][18][19].Gain Ratio (filter) combined with sequential forward selection (SFS) wrapper proposed to deal with three datasets; iris, breast, and dermatology [20,21].A various feature selection methods also compared, they were information gain, gain ratio (GR), symmetrical uncertainty (SU), Chi square (CS), relief, and correlation based feature selection (CFS) [19].The result showed that CFS was the most stable with the highest accuracy for handling data with two classes.

METHOD
As mentioned before SVM classifier was built to classify iris dataset into different classes.The using of GA is to optimize SVM's parameters (c, gamma), in order to obtain higher and best accuracy [22].The iris dataset has four attributes, principle components analysis (PCA) algorithm was 1081 applied to reduce these features (feature reduction), and then only three features were chooses.Whereas principal component analysis (PCA) is a mathematical execution that converts a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables named principal components.PCA is a dimension-decreasing instrument that can be utilized to drooping a great set of inconstant to a little set that stay involves most of the information in the big set [12,2].
The presented technique in this study used the IRIS dataset acquired from the UCI Machine Learning Repository.The dataset is in a multivariate group as it provides the statistic on the Iris plant type based on four characteristics which include width, width and petal -length, sepal -length, and values as presented in Figure 1.The dataset is composed of three groups with 50 cases each and a total of 50 cases.The dataset were first processed by removing missing data values.The type of Iris plant is the forecasted characteristic in this dataset [5].

Figure 1. IRIS dataset
Step-by-step of new model in this research for iris data set classification based on linear support vector machine parameter's optimization is: Step-1: The Iris dataset in CSV is computed as the input.
Step-2: Divide the data into test and training datasets.In this study, the dataset was partitioned into 70% training and 30% testing.
Step-3: Distinguish the training dataset based on the class values, that is, 1, 2 and 3.
Step-4: Determine the standard deviation and mean values for the individual data case based on the class values.
Step-5: Choose the SVM (C and γ) parameters as input to genetic algorithm optimization.
Step-6: Apply the optimal value of the (C and γ) parameters as an initial value to the process of classification using SVM.
Step-7: Utilize the model and generate predictions.
Step-8: Determine the prediction accuracy through the comparison of the class data of test dataset.
This accuracy is evaluated depending on the ratio between 0 to 100%.

RESULTS AND CORRELATIONS
The suggested model presented in Section 4 was performed on the Iris dataset with and without Step-5.In each run, the obtained results were evaluated based on the accuracy of the SVM classifier.The obtained results showed that the accuracy of the SVM increased to 98.7 using Step-5 and about 95.3% without Step-5.All the results, with the optimization, are presented in Figures 2, 3

CONCLUSIONS AND RECOMMENDATION
This paper have proposed a newly mode for classifying iris data set using SVM classifier and genetic algorithm, in addition PCA algorithm was use for features reduction.This proposed mode is to optimize c and gamma parameters of linear SVM.As shown above the results obtained from applied GA on iris dataset is 98.7 and without GA is 97.78.GA was used to optimize SVM's parameters (c, gamma), in order to promotion an efficacious SVM model with high accuracy and stability, the optimal parameter seek on SVM plays a fateful role.Inadvisable parameter settings result in inferior classification performance.For the future work, this study can be extend into two part; firstly by improving the performance of GA such as hybrid GA with other method as works done by [22][23][24], and secondly by apply feature selection method in SVM for optimal parameter setting as proposed in [25].


ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 10, No. 1, February 2020 : 1079 -1084 1080 from indecent parameter settings, while the perfect categorization accuracy of SVM stems from seeking optimal parameters.a.The authors submitted a new manner which optimize SVM's parameters effectively and reduce the time of optimization and calculations cost using tow nested real valued genetic algorithm (NRGA).
Int J Elec & Comp Eng ISSN: 2088-8708  A new model for iris data set classification based on linear support vector machine … (Zahraa Faiz Hussain)