Multivariate sample similarity measure for feature selection with a resemblance model

Tsehay Admassu Assegie, Ayodeji Olalekan Salau, Crescent Onyebuchi Omeje, Sepiribo Lucky Braide

Abstract


Feature selection improves the classification performance of machine learning models. It also identifies the important features and eliminates those with little significance. Furthermore, feature selection reduces the dimensionality of training and testing data points. This study proposes a feature selection method that uses a multivariate sample similarity measure. The method selects features with significant contributions using a machine-learning model. The multivariate sample similarity measure is evaluated using the University of California, Irvine heart disease dataset and compared with existing feature selection methods. The multivariate sample similarity measure is evaluated with metrics such as minimum subset selected, accuracy, F1-score, and area under the curve (AUC). The results show that the proposed method is able to diagnose chest pain, thallium scan, and major vessels scanned using X-rays with a high capability to distinguish between healthy and heart disease patients with a 99.6% accuracy.

Keywords


heart disease; multivariate similarity; random forest; University of California Irvine data repository; XGBoost;

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v13i3.pp3359-3366

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).