An Approach of Semantic Similarity Measure between Documents Based on Big Data

Mohammed Erritali; Abderrahim Beni-Hssane; Marouane Birjali; Youness Madani

doi:10.11591/ijece.v6i5.pp2454-2461

An Approach of Semantic Similarity Measure between Documents Based on Big Data

Mohammed Erritali, Abderrahim Beni-Hssane, Marouane Birjali, Youness Madani

Abstract

Semantic indexing and document similarity is an important information retrieval system problem in Big Data with broad applications. In this paper, we investigate MapReduce programming model as a specific framework for managing distributed processing in a large of amount documents. Then we study the state of the art of different approaches for computing the similarity of documents. Finally, we propose our approach of semantic similarity measures using WordNet as an external network semantic resource. For evaluation, we compare the proposed approach with other approaches previously presented by using our new MapReduce algorithm. Experimental results review that our proposed approach outperforms the state of the art ones on running time performance and increases the measurement of semantic similarity.

Keywords

distributed processing; Hadoop cluster; HDFS; Big Data; Simantic similarity; Parallel algorithm; Mapreduce programming; Wordnet

Full Text:

PDF

DOI: http://doi.org/10.11591/ijece.v6i5.pp2454-2461

Copyright (c) 2016 Mohammed Erritali, Abderrahim Beni-Hssane, Marouane Birjali, Youness Madani

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES).

Username
Password
Remember me