Enhancing training performance for small models using data-centric approaches

Reda A. El-Khoribi, Eid Emary, Amr Essam Hassan

Abstract


In this work, we propose a new system to improve the performance of classification models by applying data-centric principles. The system optimizes datasets by removing poor-quality samples and generating high-quality synthetic data. We tested the system on various classification models and datasets, measuring its performance with accuracy, precision, recall, and F1-score. The results showed significant improvements in classification performance, highlighting the effectiveness of this data-centric approach. While the scalability to large-scale datasets is still an open question, it offers great potential for future research. This approach could be valuable in critical areas like healthcare, finance, and autonomous systems, where high-quality data is crucial. Future work could explore advanced data augmentation, adapting the system for different data types like text and time-series, and extending it to semi-supervised and unsupervised learning. Our findings emphasize the importance of data quality in achieving better model performance, often overlooked in favor of model architecture. By advancing data-centric artificial intelligence (AI), this work offers a practical framework for researchers and practitioners to optimize datasets and improve machine learning systems.

Keywords


Computer vision; Data-centric; Deep learning; Generative adversarial network; Model-centric

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v15i3.pp2951-2964

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by theĀ Institute of Advanced Engineering and Science (IAES).