Integration of web scraping, fine-tuning, and data enrichment in a continuous monitoring context via large language model operations
Abstract
This paper presents and discusses a framework that leverages large-scale language models (LLMs) for data enrichment and continuous monitoring emphasizing its essential role in optimizing the performance of deployed models. It introduces a comprehensive large language model operations (LLMOps) methodology based on continuous monitoring and continuous improvement of the data, the primary determinant of the model, in order to optimize the prediction of a given phenomenon. To this end, first we examine the use of real-time web scraping using tools such as Kafka and Spark Streaming for data acquisition and processing. In addition, we explore the integration of LLMOps for complete lifecycle management of machine learning models. Focusing on continuous monitoring and improvement, we highlight the importance of this approach for ensuring optimal performance of deployed models based on data and machine learning (ML) model monitoring. We also illustrate this methodology through a case study based on real data from several real estate listing sites, demonstrating how MLflow can be integrated into an LLMOps pipeline to guarantee complete development traceability, proactive detection of performance degradations and effective model lifecycle management.
Keywords
Continuous monitoring; Data enrichment; Fine-tuning; LLMOps; MLOps; Web scraping
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v15i1.pp1027-1037
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).