Preprocessing techniques for recognition of ancient Kannada epigraphs

Anusha Leela Somashekharaiah, Abhay Deshpande

Abstract


The Dravidian language Kannada is most spoken in the state of Karnataka, and because of its extensive library of epigraphs, including old manuscripts and inscriptions, it is regarded as a repository of knowledge. To make this knowledge more accessible to the people, efforts are being made to digitize the documents for optimized usage and storage using optical character recognition (OCR) but oftentimes these epigraphs are in poor conditions and the quality of the image being fed to the OCR model may not be good enough to achieve high accuracy of recognition and classification. Preprocessing techniques are used to make the dataset more reliable by improving the quality and accuracy of the model. Preprocessing methods including binarization, smoothing, edge detection, and segmentation help to increase the model's interpretability, decrease overfitting, and train it more quickly and with fewer resources. When applied to the epigraphs, these preprocessing approaches significantly increase the image quality and minimize noise, making it simpler to identify and digitize the text.

Keywords


Ancient text recognition; Binarization; Character recognition; Edge detection; Grayscaling; Segmentation; Smoothing

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v14i1.pp358-365

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).