Evaluation of a Multiple Regression Model for Noisy and Missing Data
Abstract
The standard data collection problems may involve noiseless data while on the other hand large organizations commonly experience noisy and missing data, probably concerning data collected from individuals. As noisy and missing data will be significantly worrisome for occasions of the vast data collection then the investigation of different filtering techniques for big data environment would be remarkable. A multiple regression model where big data is employed for experimenting will be presented. Approximation for datasets with noisy and missing data is also proposed. The statistical root mean squared error (RMSE) associated with correlation coefficient (COEF) will be analyzed to prove the accuracy of estimators. Finally, results predicted by massive online analysis (MOA) will be compared to those real data collected from the following different time. These theoretical predictions with noisy and missing data estimation by simulation, revealing consistency with the real data are illustrated. Deletion mechanism (DEL) outperforms with the lowest average percentage of error.
Keywords
big data; classification; noisy and missing data; performance evaluation; regression model; root mean square error
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v8i4.pp2220-2229
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).