Analysis of big data from New York taxi trip 2023: revenue prediction using ordinary least squares solution and limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithms

Sara Rhouas, Norelislam El Hami

Abstract


This study explores the prediction of taxi trip fares using two linear regression methods: normal equations (ordinary least squares solution (OLS)) and limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS). Utilizing a dataset of New York City yellow taxi trips from 2023, the analysis involves data cleaning, feature engineering, and model training. The data consists of over 12 million records, managed, and processed that involves configuring the Spark driver and executor memory to efficiently process the Parquet-format data stored on hadoop distributed file system (HDFS). Key features influencing fare amount, such as passenger count, trip distance, fare amount, and tip amount, were analyzed for correlation. Models were trained on an 80-20 train-test split, and their performance was evaluated using root-mean-square error (RMSE) and mean squared error (MSE). Results show that both methods provide comparable accuracy, with slight differences in coefficients and training time. Additionally, vendor performance metrics, including total trips, average trip distance, fare amount, and tip amount, were analyzed to reveal trends and inform strategic decisions for fleet management. This comprehensive analysis demonstrates the efficacy of linear regression techniques in predicting taxi fares and offers valuable insights for optimizing taxi operations.

Keywords


Big data; Data analysis; Linear regression; Machine learning; Spark

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v15i1.pp711-718

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).