Implementation of deep neural networks (DNN) with batch normalization for batik pattern recognition

ABSTRACT


INTRODUCTION
Technology and globalization have brought fundamental changes in the information technology [1]. Indonesia has a tangible and intangible cultural heritage. One of the most famous cultural heritages is batik. Citing the definition of the Indonesian Dictionary (KBBI), it is explained that batik is a specially made drawing cloth by writing Malam (wax) on the cloth, then processed in a certain way [2]. The word Batik originated from the Javanese language and composed of two parts, namely "Mbat" and "Titik", and means to make a "titik" (dot) [3]. Each region in Indonesia has different batik motifs and pattern rules [4]. The diversity of motifs both in Indonesia and the allied countries raises new research topics in the field of computer science, both for conservation, storage, publication and the creation of new batik motifs.
In computer science research area, studies about Batik pattern have been done by researchers and some algorithms have been successfully applied in Batik pattern recognition [5]- [7]. In previous work, Sanabila and Manurung (2009) conducted research on batik motifs recognition using template matching approach by applying the Generalized Hough Transform and key block frames [5]. At the same time, Rahadianti, Manurung, and Pure (2009) researched the field of information retrieval using K-mean algorithm with feature extraction using Log Gabor filter and Color Histogram for batik image clustering [6]. In 2012, Nurhaida, Manurung, & Arymurthi conducted research on batik image dataset to compare performance of three methods such as Canny edge detection, grey level co-occurrence matrices (GLCM) and Gabor filter [7]. This study shows that using GLCM as features has performed the highest classification accuracy. Rangkuti et al., proposed Batik Image Retrieval based on similarity of shape and texture characteristics [8], and in 2014, they proposed Content-based Batik Image Retrieval based on shape and texture feature applying edge detection and wavelet transform method [9]. In 2014 Minarno et al. proposed Batik Image Retrieval based on enhanced micro-structure descriptor [10] and Batik Image Classification using co-occurrence matrices for extracting texture feature [11]. In 2015, Nurhaida et al. introduced an approach to batik pattern recognition using SIFT as a feature extraction method [12]. Furthermore, Fahmi et al. (in 2016), was conducted the Batik Image Retrieval using feature selection and reduction. This research shows that the selection and reduction feature process could improve the precision and reduce the execution time.
From the experiments shows that PCA feature reduction can improve the retrieval precision while SFFS can reduce the execution time [13]. In 2017, Nurhaida et al. developed Texture Fusion for Batik Motif Retrieval System, they systematically investigates the impact of image texture features on batik motif retrieval performance [14]. Based on the result of earlier work [7] and [13], this study was focused in batik motif recognition using texture fusion feature which is Gabor, Log-Gabor, and GLCM and using PCA feature reduction to improve the classification accuracy and reduce the computational time.
On the other hand, recent research [14] did not use a deep learning approach. Study in Batik using Deep Learning was performed by [15], they proposed Deep Convolutional Network Transfer Learning for Batik Classification. Based on [15]- [18], deep learning can improve accuracy in pattern recognition. One of Deep Learning approaches is a Deep Neural Network (DNN). Deep neural network (DNN) is a powerful model that can perform high achievement on pattern recognition [19]. Even though DNN has a good reputation for solving a variety of pattern recognition task, training DNN is quite challenging [20]. Several results have appeared to challenge the implementation of DNN [20]- [23] One of the challenges is due to DNN training complicated because of the distribution of each layer's inputs changes during the training since the parameters of the previous layer are changed [24]. This method can slow down the training process due to requiring lower learning rates and careful parameter initialisation. This problem addressed by normalising the layer inputs for each training mini-batch. This normalisation method called Batch Normalization (BN) allows us to use higher learning rates and careless about initialisation. Furthermore, BN also acts as a regularizer and often eliminates the required of Dropout for regularisation [25]. This study proposed to used Deep Learning method, namely Deep Neural Network to make comparison with the previous study ( [7] and [13]) and used batch normalisation as regularisation in our model. DNN with Batch Normalization is expected to make the model more general so that it can improve the accuracy of the model.

LITERATURE REVIEW 2.1. Batik
Indonesian Dictionary (KBBI) explains that batik is a specially made drawing cloth by writing the Malam (wax) on the cloth, then processing in a certain way. In Indonesia, each region has certain characteristics of the batik motifs also the way the motifs are organised [4]. The distinctiveness of batik motifs could be analysed from its characteristics of texture and shape, and the dissimilar motif has a dissimilar shape pattern. The Batik motifs pattern are divided into two groups namely geometric and nongeometric pattern. Groups of geometric pattern tend to have elements of symmetry that form triangle patterns, cross lines, rectangles, stars, parallelogram, and other patterns formed from the order of the line. The non-geometric pattern is a motif that has irregular composition of ornament like animals, plants, and so on. There are various types of geometric batik patterns; they are Ceplok, Kawung, Lereng, Parang, and Nitik. The motif of Ceplok has repetitive in geometric ornaments based on squares, circular shape, star, and other geometric shapes. The motif of Kawung known as the oldest batik pattern has the repetition of circles or elliptical shapes. The motif of Lereng has diagonal pattern lines that filled with small patterns. The motif of parang consists of parallel lines in diagonal shape filled with small ornaments. The motif of Nitik is made with small dots and lines that imitate the original woven fabric [3].

Deep neural network
Deep Neural Network is a type of Feed-Forward Neural Network that has more than one hidden layer between the input layer and the output layer. Every hidden layer j usually use a function of logistic to map the input from the preceding layer into scalar which is then sent to the next layer.
Where is the bias of unit j, i is a unit index from the preceding layer and is the weight from every connection from unit j to unit i. To classify multi-class, the output unit j converted total of input xi to the probability of class pj by using non-linear softmax, where k is the class index. DNN can be trained discriminatively by conducting backpropagation derivatives of a cost function which measure the difference of the target and actual output to obtain the result of training case [9]. When using softmax on output function, natural cost function is cross-entropy between target probability and softmax output , Where the probability of a target is usually 1 to 0, it is the supervised information provided to train the DNN classifier [18].

Batch normalization
In conventional deep networks, excessive learning rates can cause gradients that explode or vanish and get stuck in poor local minima. The Batch Normalization helps solve these problems, by normalizing the activation through the network, so it prevents minor changes in parameters from reinforcement when data is propagated through the deep networks. Furthermore, the advantages of Batch Normalization causes the training more resilient to parameter scale. Typically, a large learning rate can escalate the scale of the layer parameters, which is afterwards amplifying the gradient throughout backpropagation and cause the model explosion, thus slowing down the training process. Still, with Batch Normalization, backpropagation through layers is not affected by its parameters scale [24]. Moreover, the BN even act as a regularises where in some cases can remove the need for Dropout.
Improving training performance can be done by fixing the distribution of the input layer during the training. The way to fasten the network training convergent is by whitening its inputs layer, for example, linear transformation to obtain unit variances and zero means, and decorrelated. However, to complete the network layer to be whitened needs expensive computation process. It happens because each layer that observes the inputs generated by the layers below must have the same whitening of the inputs of each layer. Hence, to tackle this problem can be utilised by Batch Normalization.
Batch normalisation utilises the current mini-batch statistic by standardising the intermediate representations to approximate the layer whitening. A mini-batch is denoted as , in which variance of each feature along the mini-batch axis and the sample mean is calculated, and the size of the mini-batch is denoted as , as follows.
By using this equation, each feature can be formulated as follows Where is a small positive constant to complete numerical stability enhancement; however, standardising activation of intermediate can diminish the representation of the layer power. To tackle this issue, BN proposes further learnable parameters and , which individually shift and scale the data, pointing to a layer of the form.

( ) = ̂+
The original layer representation can be recovered by a network by setting to be and to be ̅ as follows.

= ( + )
Where the matrix of weights is denoted as , the vector of bias is denoted as , the input of the layer is denoted as and function of arbitrary activation is denoted as . Moreover, the BN is formalised as follows: The vector of bias has been eliminated since its impact is cancelled by the standardisation. The method of backpropagation needs to be adjusted to generate gradients through the variance and mean computations as well after the normalisation has been the part of the network [26].

RESEARCH METHOD
The proposed method of this study consist of data preprocessing including feature extraction, fusion, and normalisation; feature reduction process; tuning hyperparameters; model building and testing; and evaluation. The diagram of research methodology showed in Figure 2. The dataset that we used in this study is batik image dataset that contained basic motif template form a certain class. The dataset contains 5 class; they are Ceplok, Kawung, Lereng, Parang, and Nitik [13]. In data preprocessing, the texture features have been extracted for every image; they are Gabor filters, log Gabor filters, GLCM, and LBP. From the four feature vectors then fusion and normalised to produce one feature vector. The feature reduction experiment is conducted to improve accuracy score and to reduce execution time rather than using all features. We divided the dataset into training and testing data and performed cross-validation. After that, we conducted the hyperparameters tuning to find the parameters value, and using these parameters value in our deep learning model. Then we conduct the model building, testing, and evaluate the model.

RESULTS AND ANALYSIS 4.1. Experimental setup
This study was conducted in Ubuntu 14.04 LTS 64-bit on a PC with Processor Intel® Core™ i7-6500U CPU @ 2.50GHz × 4, Memory DDR2 RAM 8 GB and Hard Disk 160 GB. In this study, we used Matlab to extracted, fusion, and normalised the four texture feature (Gabor filters, log Gabor filters, GLCM, and LBP). In our experiment, we divided the dataset into training and testing data and performed crossvalidation. The training data took 70% of the dataset while the testing data took the rest. We implemented experiments using Keras deep learning library in Python. The Deep Learning method that we implemented is four layers Deep Neural Network (DNN) with Batch Normalization.

Feature extraction and fusion
In this study, we used four texture features as proposed [13]; they are Gabor filters, Log Gabor filters, GLCM, and LBP. The process of feature extraction and fusion successfully produced a feature vector with a length of 217 for each image. After the process of feature extraction and fusion, we conducted the feature reduction using PCA.

Features reduction
In this experiment, we conducted features reduction using PCA to reduce the component features. To make an equal comparison, we used the same batch_size and epoch value the batch_size value is 32 and epoch is 50. We used PCA component 25,50,100,150,200,210, and 217 to select the best component of the feature vector. Table 1 gives the result of features reduction using PCA. Table 1 shows that the best accuracy achieved when the number of components (N) is 50. From this experiment of feature selection and reduction, we can produce the new feature vector and reduce the vector feature to length 50. This result shows that we can decrease the computational time, and gives more accuracy rather than using all features. Table 2 shows the report of classification in each class for the N= 50. As Table 2 presented, the best score in precision, recall, and f1-score is class 1. It is due to class 1 has more number of samples than other class, so that the learning is done well.

Hyperparameters tuning
In this scenario of the experiment, we used parallel processing to tuned the hyperparameters. As a result of PCA features reduction, in this scenario, we used the 50 features component that achieved in PCA features reduction. To tuned the hyperparameters in parallel processing, we used Grid Search algorithm that provided in the GridSearchCV class in Scikit-learn. We imported KerasClassifier and GridSearchCV in our code, to allow us to use Sklearn's Grid Search. The hyperparameters that we tuned they are: batch_size, epoch, learning rate, training optimisation algorithm, network weight initialisation, and neuron activation function.

Batch size and epoch
The result in tuning batch_size and epoch shows in Table 3. The result gives the accuracy of 79.13% with standard deviation score is 0.0116 using the number of batch_size 20 and epoch 200, while the computational time took 305.104926 seconds.

Learning rate
The result in tuning learning rate presented in Table 4. The result shows the best accuracy is 0.7895 when the leaning rate 0.001. The computational time in this process took 279.603224 seconds.

Optimiser algorithm
The result of the experiment in tuning optimiser algorithm presented in Table 5. This experiment gives the best accuracy of 78.50% with standard deviation score 0.012 when using RMSProp optimiser algorithm. The processing time in this experiment took 260.960422 seconds. 4. Network weight initialisation Table 6 presented the result of tuning network weight initialisation. The result shows that lecun_uniform achieved the best accuracy of 75.51% with standard deviation 0.020530 and the computational time took 302.18661 seconds. 5. Neuron activation function Table 7 presented the result of tuning neuron activation function experiment. The best result accuracy is achieved at 75.97% when using the linear function. The computational time to process this scenario took 298.757153 seconds.

Model building, testing, and evaluation
In this stage, we build and evaluated our model (DNN+Batch Normalization) using Batik dataset. As a comparison, we also evaluate DNN without BN using Batik dataset. Table 8 gives the comparison of evaluation score of DNN and DNN+BN. The evaluation score shows that DNN+BN significantly improve the accuracy of the classification model from 65.36% to 83.15%. BN as a regularisation has successfully made the model more general, hence improve the accuracy of the model. We also implemented all parameters value that gave best accuracy from the experiment above, they are; batch_size = 20, epoch = 200, learning_rate = 0.001, optimizer = RMSprop, network weight initialization = Uniform, and neuron activation function = Softplus to our DNN model with batch normalization. Table 9 shows the evaluation score between DNN+BN Before tuning parameters and DNN+BN After tuning parameters. Table 10 showed that using the tuned parameters, the accuracy score is increased from 83.15% to 85.57%. The evaluation score in each class is presented in Table 9, and the confusion matrix showed in Figure 3.

CONCLUSION
From the experiments in this study, the results showed that the feature extraction, selection, and reduction gave better accuracy than the raw dataset. The feature selection and reduction also reduce time complexity. The DNN+BN significantly improve the accuracy of the classification model from 65.36% to 83.15%. BN, as a regularisation, has successfully made the model more general, hence improve the accuracy of the model. The parameters are tuning improved accuracy from 83.15% to 85.57%. Parameters value that gave best accuracy are: batch_size = 20, epoch = 200, learning_rate = 0.001, optimizer = RMSprop, network weight initialization = Uniform, and neuron activation function = Softplus.