An innovative Arabic light stemmer developed using a hybrid approach
Abstract
Our study introduces an innovative light stemming tool tailored for Arabic morphology challenges. In conformance with the templatic and concatenative structures, our stemmer utilizes a combination of clitic stripping, lexicon-based, and statistical disambiguation techniques to ensure accurate stemming. To accomplish this, we rely on our clitic rules lexicon to detect all potential combinations of clitics for each input entry. Subsequently, we depend on an extensive lexicon of over 7 million stems to verify the potential stems. Lastly, we employ a statistical model to ascertain the most likely stem based on the sentence's context. Experimental results demonstrate the effectiveness of the proposed stemmer in comparison with existing ones. Using different datasets, our stemmer achieves higher accuracy and F1 scores, highlighting its efficiency in Arabic stemming tasks.
Keywords
Arabic language; Large lexicon; Natural language processing; Stemming; Supervised learning
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v15i2.pp2356-2363
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).