Moment invariant-based features for Jawi character recognition

Fitri Arnia, Khairun Saddami, Khairul Munadi

Abstract


Ancient manuscripts written in Malay-Arabic characters, which are known as "Jawi" characters, are mostly found in Malay world. Nowadays, many of the manuscripts have been digitalized. Unlike Roman letters, there is no optical character recognition (OCR) software for Jawi characters. This article proposes a new algorithm for Jawi character recognition based on Hu’s moment as an invariant feature that we call the tree root (TR) algorithm. The TR algorithm allows every Jawi character to have a unique combination of moment. Seven values of the Hu’s moment are calculated from all Jawi characters, which consist of 36 isolated, 27 initial, 27 middle, and 35 end characters; this makes a total of 125 characters. The TR algorithm was then applied to recognize these characters. To assess the TR algorithm, five characters that had been rotated to 90o and 180o and scaled with factors of 0.5 and 2 were used. Overall, the recognition rate of the TR algorithm was 90.4%; 113 out of 125 characters have a unique combination of moment values, while testing on rotated and scaled characters achieved 82.14% recognition rate. The proposed method showed a superior performance compared with the Support Vector Machine and Euclidian Distance as classifier.


Keywords


feature extraction; jawi character recognition; moment invariant; optical character recognition; pattern classifier; tree root (TR) algorithm;

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v9i3.pp1711-1719

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).