An Approach of Semantic Similarity Measure between Documents Based on Big Data

Mohammed Erritali, Abderrahim Beni-Hssane, Marouane Birjali, Youness Madani

Abstract


Semantic indexing and document similarity is an important information retrieval system problem in Big Data with broad applications. In this paper, we investigate MapReduce programming model as a specific framework for managing distributed processing in a large of amount documents. Then we study the state of the art of different approaches for computing the similarity of documents. Finally, we propose our approach of semantic similarity measures using WordNet as an external network semantic resource. For evaluation, we compare the proposed approach with other approaches previously presented by using our new MapReduce algorithm. Experimental results review that our proposed approach outperforms the state of the art ones on running time performance and increases the measurement of semantic similarity.


Keywords


distributed processing; Hadoop cluster; HDFS; Big Data; Simantic similarity; Parallel algorithm; Mapreduce programming; Wordnet

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v6i5.pp2454-2461

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).