A computational analysis of short sentences based on ensemble similarity model

Arifah Che Alhadi, Aziz Deraman, Masita Masila Abdul Jalil, Wan Nural Jawahir Wan Yussof, Rosmayati Mohemad


The rapid development of Internet along with the wide use of social media applications produce huge volume of unstructured data in short text form such as tweets, text snippets and instant messages. This form of data rarely contains repeated word. It presents challenge in sentences similarity analysis as the standard text similarity models merely rely on the number of word occurrence, often resulting unreliable similarity value. Besides, the use of abbreviation, acronyms, slang, smiley, jargon, symbol or non-standard short form also contributes to the difficulty in similarity analysis. Thus, an extended ensemble similarity model approach is proposed. An experimental study has been conducted using datasets of English short sentences. The findings are very encouraging in improving the similarity value for short sentences.

Full Text:


DOI: http://doi.org/10.11591/ijece.v9i6.pp5386-5394
Total views : 156 times

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

ISSN 2088-8708, e-ISSN 2722-2578