Comparative analysis of deep Siamese models for medical reports text similarity

Dian Kurniasari, Mustofa Usman, Warsono Warsono, Favorisen Rosyking Lumbanraja

Abstract


Even though medical reports have been digitized, they are generally text data and have not been used optimally. Extracting information from these reports is challenging due to their high volume and unstructured nature. Analyzing the extraction of relevant and high-quality information can be achieved by measuring semantic textual similarity (STS). Consequently, the primary aim of this study is to develop and evaluate the performance of four models: Siamese Manhattan convolution neural network (CNN), Siamese Manhattan long short-term memory (LSTM), Siamese Manhattan hybrid CNN-LSTM, and Siamese Manhattan hybrid LSTM-CNN, in determining STS between sentence pairs in medical reports. Performance comparisons were conducted using Cosine Similarity and word mover's distance (WMD) methods. The results indicate that the Siamese Manhattan hybrid LSTM-CNN model outperforms the other models, with a similarity score of 1 for each sentence pair, signifying identical semantic meaning.

Keywords


Biomedical natural language processing; BioWordVec; Hybrid LSTM-CNN; Medical report; Semantic text similarity; Siamese Manhattan

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v14i6.pp6969-6980

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).