Preprocessing techniques for recognition of ancient Kannada epigraphs
Abstract
The Dravidian language Kannada is most spoken in the state of Karnataka, and because of its extensive library of epigraphs, including old manuscripts and inscriptions, it is regarded as a repository of knowledge. To make this knowledge more accessible to the people, efforts are being made to digitize the documents for optimized usage and storage using optical character recognition (OCR) but oftentimes these epigraphs are in poor conditions and the quality of the image being fed to the OCR model may not be good enough to achieve high accuracy of recognition and classification. Preprocessing techniques are used to make the dataset more reliable by improving the quality and accuracy of the model. Preprocessing methods including binarization, smoothing, edge detection, and segmentation help to increase the model's interpretability, decrease overfitting, and train it more quickly and with fewer resources. When applied to the epigraphs, these preprocessing approaches significantly increase the image quality and minimize noise, making it simpler to identify and digitize the text.
Keywords
Ancient text recognition; Binarization; Character recognition; Edge detection; Grayscaling; Segmentation; Smoothing
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v14i1.pp358-365
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).