Devanagari optical character recognition of printed text

Malathi P., Chandrakanth G. Pujari

Abstract


Hundreds of native languages and scripts are making their way on digital platform to sustain in multiple data formats. Optical character recognition (OCR) is one such dimension where the low resource languages are yet to find their stability. Devanagari OCR is one such low resource script problem to be dealt with, though it is the fourth widely used global script. Recent works carried on OCR have focused on word level approach and face challenges of spiraling complexity as language alphabet set size crosses hundreds. Most of these OCR works are done in constrained environment, with huge datasets and large computational resources. As a result, effective benchmark evaluation of the works against one another on defined metrics is scarce. Aim here is to explore character level Devanagari OCR with printed text images as input. Pattern recognition (PR) principles for diacritic classification and convolutional neural network (CNN) for base character classification are used. word error rate (WER) of 24.47% is attained. However, the training dataset complexity is reduced by 4.35 times. The ten multi class models, training time range from 45 minutes to 2.5 hours. Further the models can be trained in parallel to complete the training process in 3-4 hours. Thus, the approach used for text classification facilitates the Devanagari OCR solution to be offered in off-the-shelf computing devices.

Keywords


Convolutional neural network; Dataset complexity; Devanagari script; Diacritic; Lexicon; Low resource

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v15i6.pp5914-5923

Copyright (c) 2025 Malathi P., Chandrakanth G. Pujari

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by theĀ Institute of Advanced Engineering and Science (IAES).