Preprocessing techniques for recognition of ancient Kannada epigraphs

ABSTRACT


INTRODUCTION
About 44 million people speak Kannada, a Dravidian language, primarily in the state of Karnataka in southern India.One of India's oldest tongues, it has a long literary history that dates back to the ninth century.The script used in Kannada is distinctive and is descended from the ancient Brahmi script.Sanskrit and other Indo-Aryan languages, as well as other Dravidian languages, have all influenced and been influenced by the language.The old Kannada script has evolved over time and has been used to write inscriptions, manuscripts, and other important documents.The study of ancient Kannada script is important for understanding the history, culture, and language of Karnataka and the wider region.However, it can be challenging because the old Kannada script is significantly different from the modern Kannada script, which has evolved over time and across multiple dynasties.
The process of locating and deciphering extinct scripts or writing systems is known as ancient script recognition.Common examples of ancient artifacts with these scripts include pottery, coins, tablets, and manuscripts.Understanding the history, culture, and language of ancient civilizations requires a thorough understanding of ancient scripts.Modern developments in artificial intelligence and machine learning have made it possible to automate the process of script recognition.A standard optical character recognition (OCR) system is designed to first divide the entire document into text lines, then into words, and last into individual characters, from which the relevant features can then be extracted, the characters can then be recognized, and

359
the characters can then be classified.The required features are then extracted from these characters, which are subsequently recognized and classified using these characters [1].
In order to identify and translate the script, it is necessary to train computer algorithms to recognize patterns and characters in the script.However, due to the poor preservation of many ancient scripts, ancient script recognition continues to be a complex and difficult field.Ancient text character recognition using deep learning selection auto encoder decoder technique when used for the purpose of character level recognition where the binarization accuracy was 74.24% and segmentation using the seam carbel method showed an accuracy (As) of 70% [2].
The various preprocessing techniques used in character recognition systems with various types of images, ranging from simple handwritten form-based documents and documents with colored and complex background and varied intensities, are covered in analytical reviews on recent OCR system architectures and technologies [3]- [5].Preprocessing, which entails cleaning and converting raw data into a format that can be analyzed, is a crucial step in data analysis and machine learning.Preprocessing in offline optical character recognition appears to be considerably influenced by both preprocessing techniques and operating conditions, according to research [6].Preprocessing techniques like gray scaling, binarization, edge detection, smoothing among others are used to improve the accuracy of the machine learning model [7].Preprocessing is significant because it addresses common problems with raw data like noise, missing values, and inconsistencies.This improves the accuracy and effectiveness of machine learning models.Therefore, preprocessing algorithms in ancient script recognition can improve the quality, accuracy, and efficiency of the recognition model.

METHOD
Six essential phases make up a standard character recognition model: data acquisition, preprocessing, segmentation and augmentation, feature extraction, classifier training, and evaluation.Preprocessing is essential for preparing raw data for the creation and training of machine learning models.It is much more crucial in the case of ancient writings because they are not always discovered or preserved in the best conditions and may not be completely appropriate for the neural network model in its raw form [8], [9].The preprocessing is further divided into various steps such as gray scaling, binarization, edge detection and cropping, smoothing, line and character segmentation depending on the type of epigraph used.A combination of smoothing and character segmentation was successfully implemented on stone inscription whereas a combination of line and character segmentation was more successful in handwritten scriptures and manuscripts.Greyscaling turns a regular red, green, and blue (RGB) image into a grey scale and stores values in a single array (black and white), so the calculation above in a neural network only needs to calculate one convolution as opposed to performing convolutions on three different arrays for an RGB image.Greyscaling increases computational efficiency as a result.Image binarization, on the other hand, is the process of converting a document image into a bi-level document image.A dual collection of picture pixels is made up of black and white pixels.The main goal of picture binarization is to distinguish between the foreground and background text in a document [10].
Edge detection is an image processing technique used to identify regions in a digital image with abrupt brightness fluctuations or any other visual discontinuities.The areas where the brightness of the image varies significantly are the margins or boundaries of the image.It may be essential to "crop" an image by eliminating or changing its outer edges in order to enhance framing or composition, call attention to the image's subject, change the size, or aspect ratio.In order to improve an image, it must be cropped, which entails removing any unnecessary elements.If necessary, skew correction is carried out to rectify the text skew in an image that has a block of text rotated at an unidentified angle [11], [12].This can be done by finding the text block in the picture, calculating the text's rotational angle and then the image is rotated to account for the skew.A method of digital image processing known as "image smoothing" lowers and suppresses the image noise.Line segmentation is used to identify and segment the lines that are present in the image or the block of text.Character segmentation further identifies and segments the characters that are present in the text in the image.

Data acquisition and preprocessing
Preprocessing is a critical step in image processing that can significantly impact the accuracy and performance of subsequent analysis or processing steps.Preprocessing techniques can aid in the extraction of significant features and lessen the computational burden of subsequent processing steps by removing noise and artifacts, enhancing relevant information, and enhancing image quality [13].Additionally, preprocessing can standardize image dimensions and orientation, adjust for variations in lighting, and improve the image's visual appeal.Preprocessing, in general, is essential for enhancing the precision, effectiveness, and utility of image processing in a variety of applications, including computer vision, medical imaging, and remote sensing [14].The preprocessing steps are explained in further detail below.

. Gray scaling
Images are nothing more than tidy arrays of 0 to 255 numeric values for our computer to display saturation.To make color, red, green, and blue are fundamentally combined in a variety of ratios.When a picture is a color image, an array of red, blue, and green values is digitally recorded.The computer "mixes" color outputs on the spot.As soon as it detects this, the neural network executes convolution on each of the red, green, and blue channels [15].However, this complicates our calculation and necessitates the use of three separate arrays for convolutions as opposed to simply one array.The complexity of an image's threedimensional (3D) pixel value can be reduced to a one-dimensional (1D) value.

Binarization
The thresholding approach is important throughout the image processing stage of document image scanning and processing.The primary objective of binarization is to take the characteristics of the original photos and eliminate any noise [16].Image thresholding is used to binarize an image based on pixel intensities.As a result, each pixel will hold less data, simplifying subsequent computations [17].A threshold and a grayscale image are normally needed as inputs for such a thresholding technique.The outcome is a binary picture.If a pixel's intensity in the input image exceeds a threshold, the associated output pixel is marked as white (foreground), and if it is less than or equal to the threshold, the corresponding output pixel is marked as black (background).The steps that Otsu thresholding algorithms usually take are as follows: i) processing the supplied image, ii) obtain the pixel distribution histogram for the image, iii) find the threshold value T, and iv) replace image pixels with white in areas where saturation is higher than T and black in areas where the contrary is true.
By minimizing the variance for each class, Otsu's methodology analyses the image histogram and segments the objects [18].In most cases, this technique produces the desired results for bimodal images.The basic approach consists on dividing the image histogram into two clusters using a threshold that is chosen by minimizing the weighted variance of these classes, denoted as 2().The general algorithm's pipeline for the between-class variance maximization option can be represented in the following way: i) calculate the histogram and intensity level probabilities; ii) initialize i(0), i(0); iii) iterate over possible thresholds:  = 0, maximum intensity; iv) update the values of i, i, where i is a probability and i is a mean of class i; v) calculate the between-class variance value b 2 (t); and vi) the final threshold is the maximum b 2 (t) value.

Edge detection and cropping
The image is trimmed to remove the extraneous areas and improve it after greyscaling and binarization.In order to improve framing or composition, bring attention to the topic of the image, change the size, or aspect ratio of an image (usually a photo), the outside margins can be removed or adjusted.Following cropping, edge detection is employed to locate any points in the image with sharp changes in brightness, often known as discontinuities [19].The areas where the brightness of the image varies significantly are the margins or boundaries of the image.The Canny edge detection model turned out to be the most efficient one for the model used to recognize manuscripts and inscriptions.In several disciplines, the Canny edge detection algorithm is widely utilized.Noise pollution in the actual workplace is a problem since it can interfere with image edge identification, which can result in observable errors and lost edge details.To eliminate pepper salt noise from the image and extract edge information of the area of interest from the image, an image edge recognition technique based on Canny algorithms is presented [20].This is the most popular, effective, and difficult strategy compared to many other ones.It is a multi-stage algorithm used to locate and/or identify different types of edges.The key steps used when implementing Canny algorithm are as follows: i) the image should be grayscale; ii) edge identification using derivatives is noise-sensitive, reducing noise is crucial; iii) to establish the edge's direction and intensity, determine the gradient; iv) non-maximum suppression, applied to soften the edges of the image; v) use a twofold threshold to differentiate between the images' strong, weak, and irrelevant pixels; and vi) hysteresis edge tracking only helps weak pixels become strong ones when there is a strong pixel nearby.

Smoothing
Before segmenting the text within the dataset for epigraphs in the form of stone inscriptions, smoothing is carried out.Smoothing is a technique used to create images that are less noisy and pixelated [21].Most smoothing techniques use low-pass filters, but you can also use a kernel a moving collection of pixels by averaging or median-ing that group of pixels to smooth an image.The aim of picture smoothing techniques is to maintain the quality of the image while reducing noise without affecting the main aspects of the image.These noises could be additive, impulsive, multiplicative, or of any other form.By removing noise, smoothing the inscriptions and epigraphs makes it possible to segment text more effectively.

Line and character segmentation
For historical handwritten document photographs, successful text recognition performance necessitates robust and efficient page segmentation [22].We are shown an image with text that is written in lines at this level of segmentation.Line level segmentation aims to divide the image into lines.The idea is that the rows that correspond to each line of text have a higher proportion of foreground pixels, which, when the binary image is horizontally projected, corresponds to higher peaks in the histogram.The rows that depict the intervals between the lines, which correlate to lower histogram peaks, include a lot of background pixels.It is feasible to segment the lines by picking rows as the segmenting lines that correspond to the lower peaks of the histogram.Character segmentation divides an image of a group of characters into smaller images that represent the symbols individually.Within an optical character recognition (OCR) system, it is a decision-making process.Its inference that a character (or another recognizable unit) is represented by a solitary pattern in the image may or may not be true.By being incorrect frequently enough, it significantly increases the system's mistake rate.At this level of segmentation, a single word that was previously segmented and is made up of a string of characters is shown to us as a picture.Thus, text line segmentation is crucial to a document recognition system's overall effectiveness [23].
Character segmentation is an important phase in the OCR system.The division of broken characters is a crucial factor in determining how well a recognition system works [24].The goal of character level segmentation is to separate the characters in the image's text into separate groups.This level of segmentation may or may not be necessary, depending on the circumstances surrounding the OCR's application.If OCR technology is being utilized on text and words have separate characters, character level segmentation is not required.Since a uniform space is maintained between the letters within a word, no matter how little it may be, we can segment the characters in the previous stage (by selecting a very low threshold).On the other hand, if the OCR system is being used on the text and characters inside a word are joined (cursive handwriting), character level segmentation must be done.In order to determine the segmentation sites, the segmentation process uses geometry and shape.Word image thinning is then used to obtain the width of a pixel's worth of stroke and locate the ligatures of Kannada letters [25].Character level segmentation is crucial for Kannada scripts in order to recognize the more intricate and densely located characters.In contemporary Kannada, it is possible to recognize the segmented characters and process, categorized, and store them correctly.

RESULTS AND DISCUSSION
The above-mentioned preprocessing techniques were carried out for the input image shown in Figure 1.The image shown is a stone inscription from Hampi, wherein the language and characters may have certain differences when compared to the present-day characters of the Kannada language.As shown below, the image dimensions, clarity and quality need to be improved and polished before it is fed into a neural network for identification and classification.Figure 2. shows the input image successfully preprocessed and converted to a grayscale image from a RGB image.Figure 3. shows the image being further binarized respectively using the Otsu Thresholding algorithm.The image now contains pixels of values 0 and 1 representing the colors black (the pixels with higher intensity) and white (the pixels with lower intensity).This makes it much easier for computation.The Otsu thresholding algorithm holds several advantages over other algorithms owing to its automatic threshold determination, robustness to varying lighting conditions, computational efficiency, and adaptability to diverse image characteristics, making it a valuable tool for image segmentation and thresholding tasks.Smoothing algorithms are also performed to improve the quality of the image by removing the noise, as shown in Figure 6, therefore making it easier for segmentation.Line segmentation is performed to divide and identify the lines in the text, as shown in Figure 7, and character segmentation is performed to further divide the lines into characters based on the gaps between the characters.This can be observed in the results shown in Figure 8.The stone inscription or the epigraph has been successfully segmented based on its characters and can now be fed to the neural networks to be identified, recognized, and classified.

CONCLUSION
This paper discusses the importance of ancient script recognition and the challenges it poses, particularly in the case of the old Kannada script.The proposed solution for ancient scripts consists of various preprocessing techniques such as gray scaling, binarization, edge detection and cropping, smoothing, line segmentation, and character segmentation.This technology can be used for the various ancient epigraphs depending on the noise levels, meaning that all the preprocessing steps need not be performed on all the images.Because the key motive behind developing this solution was to improve the quality of the image before it is fed into a recognition model, it can be concluded that scriptures with higher noise levels, or belonging to older dynasties, may require much more preprocessing than the scriptures which are of low noise levels and comparatively better quality.The above-mentioned techniques and methodology have been successful in improving the quality of the image and thereby making it computationally more efficient when fed to the neural network.This can be used in recognition models for not just ancient Kannada epigraphs but also other epigraphs or ancient scripts which may be using neural networks such as convolutional neural networks (CNN) or any machine learning algorithms.The emphasis is likely to be on enhancing accuracy, adaptability, and integration with other modalities and techniques as ancient Kannada script recognition continues to be essential for a variety of industries, preservation of knowledge, and culture.With the development of machine learning and artificial intelligence, we can anticipate seeing more sophisticated preprocessing methods that can recognize the unique qualities of each image and learn more intricate features from them.These developments are probably going to increase accuracy rates and make OCR systems more dependable, which will ultimately help businesses, organizations, and people who use OCR technology for a variety of purposes.