The Pertinent Single-Attribute-Based Classifier for Small Datasets Classification

Mona Mamdouh Jamjoom

Abstract


Classifying a dataset using machine learning algorithms can be a big challenge when the target is a small dataset. The OneR classifier can be used for such cases due to its simplicity and efficiency. In this paper, we revealed the power of a single attribute by introducing the pertinent single-attribute-based-heterogeneity-ratio classifier (SAB-HR) that uses a pertinent attribute to classify small datasets. The SAB-HR’s uses feature selection method, which uses the Heterogeneity-Ratio (H-Ratio) measure to identify the most homogeneous attribute among the other attributes in the set. Our empirical results on 12 benchmark datasets from a UCI machine learning repository showed that the SAB-HR classifier significantly outperformed the classical OneR classifier for small datasets. In addition, using the H-Ratio as a feature selection criterion for selecting the single attribute was more effectual than other traditional criteria, such as Information Gain (IG) and Gain Ratio (GR).


Keywords


single-attribute-based classifier; OneR classifier; small dataset; feature selection; Heterogeneity Ratio; classification

References


T. Mitchell, Machine Learning, McGraw Hill, 1997. doi:10.1146/annurev.cs.04.060190.002221.

T. Van Gemert, On the influence of dataset characteristics on classifier performance, (2017) 1–13.

V. Vapnik, Statistical Learning Theory, Wiley, New York, 2000.

N.H. Ruparel, N.M. Shahane, D.P. Bhamare, Learning from Small Data Set to Build Classification Model : A Survey, in: IJCA Proceedings on International Conference on Recent Trends in Engineering and Technology 2013 ICRTET(4), 2013: pp. 23–26.

X. Chen, J.C. Jeong, Minimum reference set based feature selection for small sample classifications, Proceedings of the 24th International Conference on Machine Learning - ICML ’07. (2007) 153–160. doi:10.1145/1273496.1273516.

S.L. Happy, R. Mohanty, A. Routray, An effective feature selection method based on pair-wise feature proximity for high dimensional low sample size data, 25th European Signal Processing Conference, EUSIPCO 2017. 2017-Janua (2017) 1574–1578. doi:10.23919/EUSIPCO.2017.8081474.

A. Golugula, G. Lee, A. Madabhushi, Evaluating feature selection strategies for high dimensional, small sample size datasets., Conference Proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference. 2011 (2011) 949–52. doi:10.1109/IEMBS.2011.6090214.

I. Soares, J. Dias, H. Rocha, M. do Carmo Lopes, B. Ferreira, Feature selection in small databases: A medical-case study, IFMBE Proceedings. 57 (2016) 808–813. doi:10.1007/978-3-319-32703-7_157.

T. Shaikhina, D. Lowe, S. Daga, D. Briggs, R. Higgins, N. Khovanova, Machine learning for predictive modelling based on small data in biomedical engineering, IFAC-PapersOnLine. 28 (2015) 469–474. doi:10.1016/j.ifacol.2015.10.185.

R.C. Holte, Very Simple Classification Rules Perform Well on Most Commonly Used Datasets, Machine Learning. 11 (1993) 63–91. doi:10.1023/A:1022631118932.

A.K. Dogra, T. Wala, A Comparative Study of Selected Classification Algorithms of Data Mining, 4 (2015) 220–229.

F. Alam, S. Pachauri, Comparative Study of J48 , Naive Bayes and One-R Classification Technique for Credit Card Fraud Detection using WEKA, Advances in Computational Sciences and Technology. 10 (2017) 1731–1743.

V.S. Parsania, N.N. Jani, N.H. Bhalodiya, Applying Naïve bayes , BayesNet , PART , JRip and OneR Algorithms on Hypothyroid Database for Comparative Analysis, (2015) 1–6.

C. Nasa, Suman, Evaluation of Different Classification Techniques for WEB Data, International Journal of Computer Applications. 52 (2012) 34–40. doi:10.5120/8233-1389.

L. Du, Q. Song, A simple classifier based on a single attribute, Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications, HPCC-2012 - 9th IEEE International Conference on Embedded Software and Systems, ICESS-2012. (2012) 660–665. doi:10.1109/HPCC.2012.94.

H. Dash, M., & Liu, Feature selection for classification, Intelligent Data Analysis. 1 (1997) 131–156.

H.L. and L. Yu, Toward integrating feature selection algorithms for classification and clustering, Knowl. Data Eng. IEEE Trans. 17 (2005) 491–502.

M. Ramaswami, R. Bhaskaran, A Study on Feature Selection Techniques in Educational Data Mining, Journal of Computing. 1 (2009) 7–11. http://arxiv.org/abs/0912.3924.

Y. Pan, A Proposed Frequency-Based Feature Selection Method for Cancer Classification, 2017.

I. Sangaiah, A.V.A. Kumar, A. Balamurugan, An Empirical Study on Different Ranking Methods for Effective Data Classification, Journal of Modern Applied Statistical Methods. 14 (2015) 35–52. doi:10.22237/jmasm/1446350760.

M. Trabelsi, N. Meddouri, M. Maddouri, A New Feature Selection Method for Nominal Classifier based on Formal Concept Analysis, Procedia Computer Science. 112 (2017) 186–194. doi:10.1016/j.procs.2017.08.227.

R. Holte, Machine Learning, 1993.

D.I. Morariu, R.G.C. Ulescu, M. Breazu, Feature Selection in Document Classification, in: The Fourth International Conference in Romania of Information Science and Information Literacy, Romania, 2013.

J. Novakovic, Using Information Gain Attribute Evaluation to Classify Sonar Targets, 17 ThT Elecommunucation Forum. (2009) 1351–1354.

C.G. Nevill-Manning, G. Holmes, I.H. Witten, The development of Holte’s 1R classifier, Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems. (1995). doi:10.1109/ANNES.1995.499480.

J. Novaković, P. Strbac, D. Bulatović, Toward optimal feature selection using ranking methods and classification algorithms, Yugoslav Journal of Operations Research. 21 (2011) 119–135. doi:10.2298/YJOR1101119N.

U. of. Waikato, WEKA: The Waikato Environment for Knowledge Acqui- sition, (2018). http://www.cs.waikato.ac.nz/ml/weka/.

UCI, UCI Machine Learning Repository, (2018). http: //archive.ics.uci.edu/ml/machine-learningdatabases/.




DOI: http://doi.org/10.11591/ijece.v10i3.pp%25p
Total views : 19 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.