Degraded character recognition from old Kannada documents

Sridevi Tumkur Narasimhaiah, Lalitha Rangarajan

Abstract


This paper addresses preparation of a dataset of Kannada characters which are degraded and robust recognition of such characters. The proposed recognition algorithm extracts the histogram of oriented gradients (HOG) features of block sizes 4x4 and 8x8 followed by principal component analysis (PCA) feature reduction. Various classifiers are experimented with and fine K-nearest neighbor classifier performs best. The performance of proposed model is evaluated using 5-fold cross validation method and receiver operating characteristic curve. The dataset devised is of size 10440 characters having 156 classes (distinct characters). These characters are from 75 pages of not well preserved old books. A comparison of proposed model with other features like Haar wavelet and Geometrical features suggests that proposed model is superior. It is observed that the PCA reduced features followed by fine K-nearest neighbor classifier resulted in the best accuracy with acceptance rate of 98.6% and 97.9% for block sizes of 4x4 and 8x8 respectively. The experimental results show that HOG feature extraction has a high recognition rate and the system is robust even with extensively degraded characters.

Keywords


5-fold cross validation; classifier validation methods optical character recognition; degraded Kannada character recognition; histogram of oriented gradients features; preprocessed character database; principal component analysis;

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v12i4.pp3632-3641

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).