A study of feature extraction for Arabic calligraphy characters recognition

ABSTRACT


INTRODUCTION
As a subfield of the pattern recognition domain, optical character recognition has seen significant development, especially with languages that use Latin characters.As a result, the Latin optical character recognition systems have sought to achieve very high levels of accuracy.Many commercial products, which permit the automatic transformation of a text image into machine-editable and readable text, are available worldwide.Otherwise, it is only in the last years that handwriting recognition has attracted the interest of researchers, but the results are not very interesting compared to those of Latin-based optical character recognition (OCRs).There is also a significant lack of research interest in Arabic OCR for historical documents.This lack is due to many reasons, on the one hand, the absence of a public database of Arabic words and characters, and, on the other hand, a diversity of shapes and sizes for each character, where a single character can have up to five different forms.
Figure Figure 1 shows Arabic historical documents that represent a great fortune worldwide, mainly for literature, art, history, and other fields.Therefore, the need for an Arabic historical document processing system for ancient calligraphic styles creates a new fruitful research dimension for OCR.Verily, during the ISSN: 2088-8708  A study of feature extraction for Arabic calligraphy characters recognition (Abdelhay Zoizou) 871 past five years, few works have been conducted to process Arabic historical documents [1]- [5].The optical recognition of these documents is very challenging because they are written in calligraphic style, which is usually more difficult to recognize than normal handwriting.In fact, ancient Arabic characters are very different from modern handwriting, mainly because of the shape variation of the same character.While a standard Arabic character changes the shape depending on the position within a word, ancient characters may have different shapes for the same position.Ancient Arabic is distinguished from other scripts by the connectivity of its characters.Its artistic aspect makes it widely used in the decoration of old buildings and palaces.Al-Mojawhar, for example, is one of the most widely used ancient calligraphies.It was used to write public and private letters, as well as scientific and artistic books, which is shown in Figure 2. Recently, Zoizou et al. [6] published a database of words, sub-words, and characters extracted from historical documents written in Mojawhar calligraphy.As it is a new database, it has to be exploited and well-studied through experiments and analysis.This work is about exploring the proposed database.Several experiments were conducted using this new database to find the most suitable combination of feature description methods and classification.For this purpose, we used some popular classifiers, namely multilayer perceptron (MLP), support vector machines (SVM), k-nearest neighbor (KNN), and random forest (RF).For each category of feature extraction methods, we chose the most used ones in the optical character recognition field.Namely, scale invariant feature transforms (SIFT) and histogram of oriented gradient (HOG) as distribution-based descriptors, Zernike as moment-based descriptors, and Gabor filter as a spatial frequency-based descriptor.We also tested the studied classifiers with raw pixel data values as features.In a final experiment, we built a deep convolutional neural network to classify the character images.
The research on Arabic OCR did not gain much interest until the last years.Only a few works have been published to contribute to developing printed and handwritten Arabic OCR systems.Nevertheless, the results are still considered shy due to the complexity of Arabic letters' shapes, which require more sophisticated form descriptors to produce accurate features.
Elleuc et al. [7] used Gabor filter features as input to an SVM classifier based on radial basis function (RBF) and polynomial kernels for Arabic handwritten recognition.To test the suggested model, a handwritten Arabic character database (HACDB) database with 66 different classes is used.It is reported that the success classification rates are 88.77% for the RBF-based SVM and 70.82% for the polynomial-based SVM.
Alternatively, Jebril et al. [8] combined SVM with HOG descriptor as a feature extractor.Before extracting features, the input images are submitted to several consecutive pre-processing operations such as cleaning, binarization, color normalization and segmentation of words into small windows.The resulted features' vectors are fed to an SVM.A recognition success rate of 99% was reported after implementing the proposed model on a private dataset of Jordanian city names.
Hassen and Khemakhem [9] presented a comparative study on some feature extraction methods used for Arabic handwriting recognition.The selected methods are Gabor filter, wavelet transform, Fourier transform and Hough transform.They are compared in terms of the capability to extract invariant features.To evaluate the precision of the different methods, Euclidean minimum distance classifier (EMDC) is used as classifier.Each feature extraction method is tested against IFN/ENIT dataset.The reported results show that Hough transform and Gabor are more precise in extracting representative and invariant features.
Elleuch et al. [10] proposed a system of Arabic handwritten character recognition.The suggested system is a multi-class SVM with RBF kernel combined with HOG feature extractor.The use of Gabor filter as a handcrafted feature descriptor is also investigated.The proposed HOG-SVM system was experimented with IFN/ENIT database and obtained a recognition accuracy of 98.5%.While the Gabor-SVM system reached 92.8%.
Hassan et al. [11] proposed a model of Arabic word recognition based on SIFT as feature extraction method and SVM as classifier.The extracted features with SIFT were clustered into groups using k-means.The proposed system was tested against the Arabic handwritten database AHDB database and presented a success recognition rate of 99.08%.
Althobaiti and Lu [12] used Freeman chain code for Arabic character identification.The process starts with setting a bounding box as the smallest rectangle containing the character with all its auxiliary parts, i.e., dots and symbols.The chain code is extracted from the character shape and reduced to a minimal 7-digit form.The chain code is encoded, and more statistical features are added to construct a feature sequence of 11 digits.A confusion matrix is calculated to check the utility of the proposed method.It is claimed that an accuracy of 92% to 97% is reached.
Altwaijry and Al-Turaiki [13] proposed a convolutional neural network (CNN) model for Arabic handwritten character recognition.The proposed model consists of three convolutional layers, three maxpooling layers, and two fully connected layers.Finally, an output layer of 29 units.Rectified linear unit ReLU is chosen as an activation function for convolutional layers, while SoftMax is used for the output layer.The model is tested on a private database built by authors containing 47,000 images, and the well-known Arabic handwritten characters database (AHCD) database [14].The CNN model reached an accuracy of 97% on the AHCD database, while on the private one, it did not exceed 88%.
Wagaa et al. [15] proposed a CNN model for Arabic character recognition.The model consists of four convolutional layers, two maxpooling layers, and three fully connected layers.Except for the output layer, which uses SoftMax, each other layer uses ReLU function for activation.The authors studied the use of several optimizations and data augmentation algorithms.The proposed model was tested on AHCD and Hijja datasets.It is claimed that 98% and 91% recognition success rates are reached.
Bai et al. [16] introduced the shared-hidden-layer convolutional neural network (SHL-CNN) for character recognition of different languages.The proposed model architecture consists of two parts.The first one is shared for all languages and has two convolutional layers, two maxpooling layers, two contrast normalization layers, and two local convolutional layers.The second part is a non-shared layer which consists of a SoftMax fully connected layer.Although the study does not include Arabic letters, yet obtained some promising results.
Shams et al. [17] proposed a hybrid model for handwritten character recognition based on CNN and SVM.The proposed CNN architecture includes three convolutional neural layers, each followed by a maxpooling layer.Then comes one fully connected layer and the output layer, which is submitted to a dropout function and fed to the SVM classifier.The deep CNN-SVM model was tested on a private database of 16,800 images and presented a classification error rate of 4.9%, which is very promising.
Younis and Khateeb [18] presented a CNN model for offline Arabic character recognition.Three convolution layers with one fully connected layer and one output layer.The dropout technique is used with a probability of 0.5 to reduce overfitting.The model is experimented on both AHCD and AIA9K databases and obtained 97.6% and 94.8%, respectively.
Alrobah and Albahli [19] presented a novel study using a hybrid model for Arabic handwritten character recognition.They proposed a recognition model based on a deep convolutional neural network as a feature extractor.The extracted features are then fed to three machine learning models for classification, namely SVM, XGBoost, and neural network as SoftMax fully connected layers.They tested multiple architecture combinations with different parameters on the Hijja dataset.The highest recognition rate was reported for the CNN-SVM combination, with a success rate of 96.3%.

METHOD
In this study, we used different methods of feature extraction and classification to perform Al-Mojawhar recognition.These two phases are known to be crucial and risky in the process of establishing an optical character recognition system.It is in these phases that the graphical shape of the character is converted into a numeric one that can be manipulated and edited digitally.The literature presented several effective techniques to extract representative features and classify characters.However, in the case of ancient Arabic characters, the complexity imposed by the shape's variation and additional signs is too high compared to modern writing.For feature extraction, we considered using Scale-invariant feature transform known as SIFT [20], histogram of oriented gradient known as HOG [21], Zernike moments [22], Gabor filter [23], and contour-based features [24].We also studied the use of raw pixel data to build features' vectors.[20], the principle behind SIFT is to convert the image pixel values into the rotation and scale invariant coordinates relative to the local features, as shown in Figure 3.In the same category, HOG descriptor is known as one of the most potent descriptors in feature extraction for pattern recognition.To calculate HOG features, the images are split into many, equal, and connected zones.Then for each zone, the edge gradients and orientations are found and combined to form a 1-D histogram.The global feature vector is represented by the concatenation of all of these histograms.

Figure 3. SIFT key point detection
Zernike moments were introduced initially in the 30s by the physicist and Noble Prize winner Fritz Zernike.Zernike moments are based on orthogonal radial polynomials.They provide a unique description of the entity that does not comprehend any redundant information.Unlike most of the previous features' extraction methods, which require high-quality thresholding, and pre-processing, there is a Gabor filter.It is well known for its ability to extract representative information from multi-canal images.The Gabor filter features' extraction performance comes from the following property: the invariance to rotation, scale, and translation.
For classification, we used in this study multi-layer perceptron, support vector machines, KNN, SVM, and random forest.Finally, we used a deep convolutional neural network for its feature extraction and classification capabilities.One of the primary goals of this study is to compare in terms of efficiency the handcrafted feature calculated with different descriptors, and the learned feature generated with a deep convolutional model.
Support vector machines are popular supervised methods used for both classification and regression problems.They are known for low memory cost because their decision function uses subsets of training points.For our experiments, we used SVM with RBF and Polynomial kernels, both with a Gamma parameter range of 0.05 to 0.5.
Random forest is a supervised learning algorithm that operates by constructing multiple decision trees.It is one of the most widely used algorithms due to its accuracy, simplicity, and flexibility.It randomly chooses features, makes observations, builds a forest of decision trees, and averages the results.
The KNN algorithm is a simple, non-parametric, and supervised learning algorithm that uses proximity to make classifications or predictions.It is generally used as a classification algorithm based on the assumption that similar points can be found next to each other.Hence, the KNN algorithm predicts the appropriate class for the test data by computing the distance between the test data and all the training points.
The CNNs are a particular sub-category of feed-forward networks used mainly for image processing either as a shape descriptor or a complete classification model."The advantage of CNN is that it automatically extracts the salient features which are invariant and a certain degree to shift and shape distortions of the input characters" [25].A CNN model is generally composed of two main parts, the first one uses convolution and pooling layers to behave as a feature extractor, while the other part is a neural network of many fully connected layers that works as a classifier.
To compare ancient Arabic with modern handwriting in terms of processing difficulties, we also used it in our experiments AHCD database.It consists of 16,800 images of modern Arabic handwritten characters distributed equally in 28 classes.AHCD is known for its robustness and is widely used for evaluating Arabic handwriting recognition systems.

EXPERIMENTAL RESULTS AND DISCUSSION
To evaluate the different methods, we used characters' images from Al-Mojawhar database (MOJ-DB) database.The initial data was pre-processed with binarization and denoising.It was then normalized to a size of 50×50 pixels.The initial database has 60 images for each of the 76 characters' classes.The following experiments were performed on a Python-OpenCV environment installed on a quadcore Ryzen 7 PC with a clock speed of 2.8 GHz and 16 GB of RAM.We used an augmented database containing 600 instances per class.Table 1 presents the test's accuracies (%) as results of the conducted experiments.

Handcrafted features
We built feature vectors using HOG, SIFT, Zernike moment, Gabor filter, and raw pixel data extracted from MOJ-DB database.To use the selected features, we trained in a first round, a multi-layer perceptron (MLP).The neural network is composed of four layers, and each one is followed by a dropout function of 0.5 ratio.The model is trained for 10 epochs with a batch size of 50.In a second experiment, we fed two RBF-based and polynomial-based SVMs with different features' sets.The best results were obtained using the best parameter combinations ( = 5, Gamma = 0.05, degree = 3).After that, several experiments were conducted using a KNN classifier with different K parameter values.However, the best results are obtained with  = 3.We then trained random forest to classify the selected features, different numbers of estimators were tested, yet the best results were obtained with a value of 64.In a final experiment on handcrafted features, we tested the four descriptors in terms of time cost.To this end, we randomly selected 700-character images from MOJ-DB reduced to the size of (50×50) pixels and calculated the processing time of feature extraction using each of the descriptors for the entire set of images.Table 2 summarizes the obtained results.In cases when the available amount of data is not sufficient for machine learning training to produce accurate models, data augmentation is one of the techniques that may solve the problem.It consists of transforming the available data instance to generate new data.In this work, we augmented the initial data using distortion and a slight rotation of the initial character images.We generated two other databases.The first contains 240 instances for each character class, while the second contains up to 600.To check the effect of data augmentation on MOJ-DB characters, we trained an SVM based on polynomial kernel fed with HOG features on the three databases since the chosen model presented the best results in the previous experiments.The results are summarized in Table 3.  76).
An input layer of 50×50 binary images and three convolution layers, each followed by a maxpooling layer.The classification consists of two fully connected layers and an output layer of 76 neurons.The model can be summarized in Figure 4.The optimization function significantly influences the model learning.We also examine for this model toward the effect of optimization functions Adam and root mean squared propagation (RMSProp) on the recognition accuracy.After 30 epochs of training with a learning rate of 0.0001, the model reached the results shown in Table 4.

Discussion
Most of the previous experiments performed on MOJ-DB show that Al-Mojawhar style presents more complexity than modern handwritten characters.The classification results prove that ancient Arabic needs more effort and interest from researchers.The experiments show that the ability of multi -layer perceptron to classify ancient Arabic character images is limited compared to the other classifiers.Also, the HOG and Gabor filter outperformed the other feature extraction methods.Moreover, the features extracted with these two descriptors maintained high representation quality with all the classifiers used in experiments.In terms of computation speed, Gabor remains the fastest descriptor, while SIFT and HOG take approximately similar time for processing with an advantage for SIFT.The experimentation confirmed the fact that invariant moments are the slowest descriptors for feature extraction.For the classifiers, the polynomial based SVM, followed by random forest, presented the best classification rates for almost all the feature sets outperforming the multi-layer perceptron and multi-layer perceptron.Overall, among all the combinations, the one built with SVM-Poly and HOG gave the best classification, which is 88.8%.
Deep learning has become widely used for character recognition as new architectures, and optimization parameters are being developed.For Al-Mojawhar style, the proposed model presented the best results outperforming SVM, KNN, MLP, and random forest.The obtained results (95.6% using Adam and 95.2% using RMSProp) are explained by the great ability of the convolutional network to extract pertinent features.The obtained results prove that learned features are more accurate and representative than handcrafted features for ancient Arabic handwriting.As data augmentation helps generate new instances for each of the initial data classes, it remains the easiest solution for machine learning engineers to feed models with sufficient data.Augmenting data 10 times helped increase the recognition rate by about 5%, which is very significant in the case of character recognition.


ISSN: 2088-8708 Int J Elec & Comp Eng, Vol.14, No. 1, February 2024: 870-877 872 Int J Elec & Comp Eng ISSN: 2088-8708  A study of feature extraction for Arabic calligraphy characters recognition (Abdelhay Zoizou) 873 SIFT is a scale invariant-based descriptor.It allows the extraction of features regardless of the scale change.According to Lowe

Figure 4 .
Figure 4. CNN architecture for ancient Arabic character recognition

Table 1 .
The test accuracy (%) of classification for the different models' combinations

Table 2 .
Time cost for each features extractor

Table 4 .
The results of classification using the CNN model