Efficient intelligent crawler for hamming distance based on prioritization of web documents
Abstract
Search engines play a crucial role in today's Internet landscape, especially with the exponential increase in data storage. Ranking models are used in search engines to locate relevant pages and rank them in decreasing order of relevance. They are an integral component of a search engine. The offline gathering of the document is crucial for providing the user with more accurate and pertinent findings. With the web’s ongoing expansions, the number of documents that need to be crawled has grown enormously. It is crucial to wisely prioritize the documents that need to be crawled in each iteration for any academic or mid-level organization because the resources for continuous crawling are fixed. The advantages of prioritization are implemented by algorithms designed to operate with the existing crawling pipeline. To avoid becoming the bottleneck in pipeline, these algorithms must be fast and efficient. A highly efficient and intelligent web crawler has been developed, which employs the hamming distance method for prioritizing the pages to be downloaded in each iteration. This cutting-edge search engine is specifically designed to make the crawling process more streamlined and effective. When compared with other existing methods, the implemented hamming distance method achieves a high value of 99.8% accuracy.
Keywords
Hamming distance; Information retrieval; Intelligent crawling; Search engine; User preferences; Web crawling
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v14i2.pp1948-1958
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).