Homonym and polysemy approaches with morphology extraction in weighting terms for Indonesian to English machine translation

Budi Harjo, Muljono Muljono, Rachmad Abdullah


Homonym and polysemy features can influence some errors in translation from a source language to another target language, for example, from Indonesian to English. A lemma or a morphology factor can cause the configuration of Indonesian homonym features. For example, the word beruang can mean an animal beruang (bear) and can mean a verb alternation ber+uang (has/have money). The Indonesian polysemy feature can also impact an error in the translation process because it can have a literal meaning and a symbolic meaning. For example, the terms bunga melati (jasmine flower) and bunga hati (lover), where bunga does not only mean flower. Therefore, the development machine translation (MT) method needs to capture homonym and polysemy features in the form of a word or a phrase. This research proposes homonym and polysemy approaches with morphology extraction in weighting terms for Indonesian to English MT. First, this research uses morphology extraction to detect sentences that contain prefixes, lemma, and suffixes. Then, the word similarity measurement functions to extract homonym and polysemy in the form of uni-gram and bi-gram using bidirectional encoder representations from transformers (BERT) embedding, named entity recognition (NER), synonym-based term expansion, and semantic similarity. This research uses neural machine translation for the translation process.


Bidirectional encoder representations from transformers embedding; Morphology extraction; Neural machine translation; Semantic similarity; Synonym-based term expansion

Full Text:


DOI: http://doi.org/10.11591/ijece.v14i6.pp7036-7045

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

