Improved Javanese script recognition using custom model of convolution neural network

ABSTRACT


INTRODUCTION
Indonesia has many ethnic groups, religions, and races, so it requires a unified language, Indonesian.One of the largest ethnic groups in Indonesia, the Javanese, has the Javanese language.Indonesian uses Latin letters, while the original Javanese language uses Javanese script.This difference makes many of the younger generation of Javanese not learn much about Javanese script [1].Digitizing and making applications for learning the Javanese language and script can be a solution for the younger generation.But the light and accurate method becomes a challenge in developing the application.To build one part of this application, a recognition method is needed for recognizing Javanese script handwriting on the smartphone screen.
Various methods of handwriting recognition have been developed, both based on Latin letters such as [2]- [5] or non-Latin letters or traditional characters like [6]- [22].Research on Javanese script recognition in digital images has been carried out.Some of the latest research are [1], [8]- [13], [15], [16], [23].From some of these studies, several studies use machine learning (ML) such as [1], [8], [10]- [12], [15], [16], [23], and others use deep learning (DL) [9], [13], [22], [24].ML has relatively lighter computations than DL, but the accuracy performance of DL methods is generally better than ML methods.DL can also work better if the training dataset used is large.Almost all Javanese script recognition research is generally limited to basic characters (Carakan), consisting of 20 characters, only on research [23] which adds seven numeric characters to the recognition process.While the Javanese script actually has far more characters because the basic character is compound with the vowel a.While in Javanese script, there are six vowels, namely a, i, u, e, é, o, so there are 120 Javanese script characters, and this does not include numbers and punctuation marks.
Various research on handwriting recognition of non-Latin characters has been more advanced.They use DL with more classes and more complex processes with better accuracy, such as Bangla [7], Kannada [20], Tifinagh [6], Arabic [14], [25], and Khmer [26] characters as shown in Table 1.While in research [9], [13], [22], [24], those who have used DL get an accuracy of only about 64% in [13], about 84% in [24], about 86% in [22] and the best in [9] in 94%.Based on the results, the accuracy is not even better than various ML studies proposed for Javanese script recognition.Looking at the number of datasets used, some of them are sufficient for the DL method, which is 11,500 in [9], 2,470 in [13], 16,800 in [22], and 2,000 in [24].Reliable performance should be produced with good preprocessing and compilation of convolutional neural network (CNN) models.Several things allow for the lack of DL performance in these studies: noisy datasets without selection, the number of CNN layers that are not suitable, and the tuning parameters that are not suitable.Many CNN models have been designed for object recognition in images.Although generally, it can work to recognize various objects, including handwriting.But generally, these models were originally designed specifically for certain objects, so the CNN model's performance can only work optimally on certain objects.For example, in research [27] for multiple object detection, research [28] for face recognition, [29] for face mask recognition, and [30] for violence detection.This proves that the design of the CNN model for handwriting recognition, especially Javanese script, is needed to get optimal recognition performance.CNN models can be designed with a different number and arrangement of layers, affecting the performance of the recognition accuracy results and computational costs.
The layers that need to be considered when designing the CNN model are the convolutional layer (CL), pooling layer (PL), activation function (AF), fully connected layer (FC), loss function (LF), and optimizer (OP).CL is in charge of extracting features in the image.For CNN models with many CLs, CL in the lower layer layers extracts texture, edge, and line features, while at higher layers extracts abstract features.PL is used to perform downsampling and create output feature maps that are more robust against distortion and neuronal errors.AF establishes a functional relationship between input and output and introduces a nonlinear system into the neural network.FC is tasked with integrating and classifying local information and discrimination after the convolution and pooling process.LF is in charge of carrying out the final classification process.LF has an important role in recognition because different LFs optimize different recognition tasks (object classification, face recognition, and object recognition).Network training relies on the core step of gradient updates.In this case, the OP's job is to update gradients for faster, less lost, and simpler computations [31].Based on this theory, it is believed that designing a CNN model specifically for handwriting recognition of Javanese script is necessary.
Based on the previous explanation, it has been explained that the DL method designed for Javanese script handwriting recognition still needs to be improved because it is still limited to basic characters, and the accuracy is still not satisfactory.This study aims to design a CNN model optimized for Javanese script handwriting recognition.Recognition is not limited to 20 basic characters but also 100 other characters compounded with Javanese script vowels.The number of layers in the proposed CNN model will be minimized for a cheaper computational cost but still have good accuracy performance.This paper is organized into four parts: the first part, the second part describes the proposed method and its hypotheses, the third part discusses the results and results, and the last is the conclusion.

METHOD
This research is limited to the design of the CNN model without involving the segmentation process or object detection.The handwritten image has passed the preposing stage, arranged so that an output image with a size of 150×150 24-bits (RGB) is obtained for each character.Furthermore, the DL process is carried out with the proposed model.The proposed CNN model is presented in Figure 1.The proposed CNN model has eleven layers consisting of four convolutional layers, a pooling layer, two fully connected layers, and a softmax classifier as a loss function.The number of layers used is relatively small compared to some of the latest CNN models, such as VGGnet with 16 or 19 layers, GoogleLeNnet with 22 layers, and ResNet with 152 layers [32].The CNN model is deliberately designed with the minimum number of layers possible to reduce the computational cost.This design is determined based on the theories and hypotheses that have been described previously.Besides that, several tests have also been carried out to optimize the results.The four convolution layers perform the leaky rectified linear unit (ReLU) activation function to improve accuracy.The leaky ReLU function is a development of ReLU, which is more optimal in performance because it can improve ReLU performance, especially if a number of neurons die [33], see (1) for leaky ReLU function.The number of convolution layers is minimized because handwriting features are less complex and do not require abstract features.Every two convolution layers added a dropout of 0.2 to reduce the occurrence of overfitting.PL is placed on each CL+AF to optimize the downsampling process for output features that are more robust against distortion and neuronal errors, the type of PL used is max-pooling.
Furthermore, two FC/dense layers make integrating and classifying local information and discrimination smoother with two processing stages.Before entering the FC, it is flattened.The last layer has a softmax classifier to carry out the classification process.Meanwhile, the optimizer used in this research is ADAM, with a learning rate of 0.0007.The details of the proposed CNN model are presented in Table 2.
where  is the number to be multiplied by x, providing the output even if x is negative, the neurons in the negative area are stimulated and become active.This research is implemented with several libraries, modules, application programming interfaces (APIs), and frameworks in building the model.Some libraries include Google Colab with drive class, NumPy, Matplotlib, and OpenCV with imread class.While the module used is OS, zipfile with ZipFile class, shutil, random, hashlib, and math.The API used is Keras API with classes ImageDataGenerator, Sequential, Conv2D, MaxPooling2D, Flatten, Dense, and Input.The sci-kit learn framework with classification_report class is also used in this research.
At the testing stage, all data is loaded and then divided into three parts: training, validation, and testing, with a composition of 70%: 15%: 15%.The training and validation process is based on the proposed method with a total of 50 epochs.The result is an accuracy of about 98% for training and 96% for validation, while for loss training, it is around 4% and validation is around 14%.More detailed results can seen in Figure 3.As for the testing results, the resulting accuracy was 97.29%, while the loss reached 12.79%.The results presented in Figure 3 show that the training process's accuracy gets the best results.There is a difference of about 2% for the validation process and 1% for testing.These results can be concluded very well because the testing accuracy can reach more than 97%, as shown in Figure 3(a).However, there is a fairly large difference in the loss measurement results, whereas, in the training process, the loss is only about 4%.In comparison, it is about 14% and 12% for validation and testing, respectively, as shown in Figure 3(b).Although the resulting loss is quite large, with an accuracy above 97% for 120 classes, it can be concluded that this method is the most successful in the case of recognizing 120 Javanese script classes.Table 3 presents the evidence that the proposed CNN model is the best DL model for Javanese script recognition.However, compared to the state-of-the-art listed in Table 1, this model is perhaps not the best.Comparisons are unfair if done with different datasets, although logically, the recognition process will be more complex when you have more classes.So in this study, a comparison was also made with two other CNN models that are quite popular, namely VGG19 [34] dan ResNet50 [35].To make the comparison easier, we use two Keras APIs, ResNet50 and VGG19.
Based on the results presented in Table 3, the proposed CNN model appears to perform better than the other two models.Another finding shows that the ResNet50 model has no better performance than VGG19, even though ResNet50 has more layers and has the most complex computations.But in the case of handwriting recognition on smartphones, it turns out there is no need for many layers.The complexity of handwritten images is unlike natural objects, which are diverse and require various abstract features.Suppose you look at the sample dataset in Figure 2. The image's contrast is strong, so only texture, edge, line, and    Handwriting written on a smartphone screen is an object that has clear lines, edges, shapes, and textures, so the number of layers in the CNN model should be minimized to reduce computational costs.Recognition in Javanese script handwriting has not been widely developed as an object in various DL research and is only limited to basic characters.This study proposes a CNN model with eleven main layers, of which the number of layers is relatively less than the current CNN model.Optimizing the order and layer composition and setting tuning parameters allows the proposed CNN model to perform recognition with satisfactory accuracy, more than 97%, with a shorter training time than the ResNet50 and VGG19 models.Another contribution that should be noted is the total number of classes used in this research is 120, consisting of basic and compound characters in Javanese script.However, future research needs to be improved by increasing the accuracy and efficiency of computational costs.In addition, the number of classes can still be completed by adding numeric characters and Javanese script punctuation marks.

6631 Figure 1 .
Figure 1.Proposed CNN model for Javanese script recognition

Figure 2 .
Figure 2. Sample Javanese Script dataset was used in the research (a) basic character (b) compound character of "Ba"

Figure 3 .
Figure 3. Recognition results of proposed CNN model (a) accuracy and (b) loss

Figure 4 .
Figure 4. Training computation time of proposed CNN model

Table 1 .
State-of-the-art non-Latin handwritten recognition

Table 2 .
Proposed CNN model detailed Int J Elec & Comp Eng, Vol. 13, No. 6, December 2023: 6629-6636 6632 by 30 different people, and each person wrote four times with the HW memo android application, so there are a total of 14,400 images for 120 classes.Because this dataset is large, the augmentation process is not carried out.

Table 3 .
Comparison with another DL for Javanese script recognition

Table 4 .
Comparison with another DL for training time consumption CNN is a deep learning method that is most widely used in image recognition and classification problems.Various CNN models have been proposed for single-object recognition, multiple objects, and special objects such as faces, violence detection, and handwriting.The composition and number of layers on Comp Eng, Vol. 13, No. 6, December 2023: 6629-6636 6634 CNN greatly affect performance, accuracy, and computational speed to recognize certain objects.