Invariant behavioural based discrimination for individual representation

Received Nov 15, 2019 Revised Jun 25, 2020 Accepted Jul 11, 2020 Writer identification based on cursive words is one of the extensive behavioural biometric that has involved many researchers to work in. Recently, its main idea is in forensic investigation and biometric analysis as such the handwriting style can be used as individual behavioural adaptation for authenticating an author. In this study, a novel approach of presenting cursive features of authors is presented. The invariants-based discriminability of the features is proposed by discretizing the moment features of each writer using biometric invariant discretization cutting point (BIDCP). BIDCP is introduced for features perseverance to obtain better individual representations and discriminations. Our experiments have revealed that by using the proposed method, the authorship identification based on cursive words is significantly increased with an average identification rate of 99.80%.


INTRODUCTION
Pattern identification is commanding countless in the area of engineering like manufacturing, industrial, business including scientific disciplines like artificial intelligence, computer visualisation, remote recognising, ecology, psychology, remedy, and others. One of the well recognized area in pattern recognition is handwriting exploration, which is crucial in biometric and forensic analysis such as writer identification (WI). An author can be identified using individual writing style. Individual writing style has long been considered as individualistic, and author individuality rests on the assumption that each individual has reliable writing style [1][2][3]. This handwriting must have distinctive feature that could be generalized as individual behavioural features through handwriting shape, and this can be done through identification process. Manual writer identification (WI) based cursive handwriting needs handwriting specialist (graphologist) to discover the individuality writing features of those handwritings accordingly. Normally, features from the original handwritten document will be compared with features from the list of suspects' handwritten documents. These features will be evaluated and compared to obtain features similarity. If these chores are adjusted into computerized system, then the classical procedures of pattern recognition will take place, which include feature extraction and classification.
Many research works have been done to solve identification problem by using image processing and pattern recognition techniques [4][5][6][7]. However, from the literature reviews, no research has been done on solving WI problem using Moment Function as features extractors and Discretization methods as 737 a mechanism to probe the behavioural individuality of each writer based on cursive handwriting. Hence, this research proposes the use of moment function and improved discretization to identify an authorship of authors' cursive handwriting accordingly. A conventional geometric function with united moment invariant is implemented to extract the writers' features. Extensive exploration on the invarianceness of these invariants will be probed to seek the individuality of writing. Subsequently, these invariants features will be discretized to granularly mine these features for identifying the authorship of the writers. Despite the common usage of Discretization in data mining, to the best of our knowledge, no such study has been conducted on Discretization of the invariant behavioural features for cursive handwriting particularly in pattern recognition. The paper is outlined as follows: The current issues of WI are given in the next section. Following section provides an introduction of moment functions, united moment invariant (UMI) and integrated invariants of aspect invariant scaling (ASI) into UMI for Cursive handwriting. Next section discusses the invarianceness of cursive authorship by the proposed Discretization, followed by the computation, analysis of the proposed method in terms of inter-class and intra-class to illustrate the concept of individuality and discriminability. Following is the section that reveals its implementation and results. Finally, the last section concludes the paper and possible future work.
WI can be counted as a specific kind of vibrant biometric where the characters, shapes and handwriting styles of individual can be used as biometric features for authenticating an identity [8][9][10][11]. Typically, WI performed on official papers by a way of signature. However, there is a need to identify a writing style of a documents without signature from a personal such as in threaten letter, writer determination of old or ancient manuscript, and film script (to identify the original idea). The author credentials for questioned handwritten document have a great consequence on the criminal justice system and widely explored in forensic handwriting analysis [1,3,[12][13][14]. Despite many researchers in WI, the challenges still arise due to the limitation of human capability in observing and recognizing the style of handwriting. Hence, it has been an inspiration to the researchers to have in depth exploration on this field. The shape or style of cursive writing from one person to another is different and even for one person, it diverse in times. However, everyone has their own style of writing and typically, it is individualistic [1][2][3][4][5]. The feature must be unique, thus can be generalized as person's handwriting regardless of countless writing styles. An individual's writing style has its own particular texture and structure [12]. Each handwriting shape is slightly dissimilar for same author and relatively different for dissimilar authors. This is known as intra writer class for the same writer and inter writer class for different writers. These extracted features are entailed to be classified for group or class identification. In the concept of Pattern Recognition, it is widely depends on feature extraction, classification and learning schemes as described in [13,14]. Those techniques are important and are required in order to obtain true authorship of handwritten. In between, the process of eliminating, extracting and choosing the exact features of a person's handwriten are not an easy task in the area of pattern recognition prior to classification, where those best extracted features will be grouped into specific categories. However, it is an open question whether the extracted features are optimal or near-optimal to identify the author. Features mining may include irrelevant features, and useless for classification and sometimes degrading the performance of a classifier [15]. The features may not be independent of each other or even redundant. Moreover, there may be some features that do not provide any useful information for the task of WI. Hence, mining significant features are very important in order to identify the writer, moreover to improve the identification rate. Therefore, the objective of this paper is to explore indiscriminate distinctive behaviour features of written cursive words style by implementing integrated moment functions to acquire the features from handwriting, and discretized these data in order to represent them significantly. The basic ideas about moment functions as feature extraction in our study will be well illustrated in the next section.

RESEARCH METHOD 2.1. United moment invariant (UMI) and aspect scaling invariant (ASI) for cursive extraction
Moment function has been used in diverse fields ranging from mechanics and statistics to pattern recognition and image understanding. The use of moments in image analysis and pattern recognition was inspired by Hu [16] and Alt [17]. Hu first presented a set of seven-tuplet moments that invariant to position, size, and orientation of the image shape. A good shape descriptor should be able to find perceptually similar shape where it usually means rotated, translated, scaled and affined transformed shapes. Furthermore, it can tolerate with human in comparing the image shapes. Therefore, Yinan [18] derived united moment invariants (UMI) that can be applied in all conditions with a good set of discriminate shapes features. It effectively discriminates the shape of image on both region and boundary in discrete and continuous condition. UMI was derived based on the geometric moment invariant (GMI) and the improve moment invariant (IMI) [19]. GMI is usable for region representation in discrete condition but high in computational times for boundary representation. Thus, Yinan et al., [18] proposed the above UMI that has been proven as a good technique for feature extraction task. Unfortunately, UMI technique uses scaling factor by Hu which was already proven to have some drawbacks in terms of scaling factor. Therefore, the alternative scaling factor of aspect invariant moment (Aspect) by Feng and Keane [20] is used in this study. It obtained better invariant features without size normalisation. A fusion formulation of the scaling factor of Aspect [20] into the UMI [18] algorithm is applied in this study to extract the global word shape features from both region and in boundary representation, in discrete and continuous condition for better individual features. UMI are best method use to discriminate handwriting features and applicable in any discrete condition as decribed by Hu [16], which considers normalized central moments as shown below (refer to [16] for detail formulations of GMI): with (2) in discrete form. Central and normalized central moments are summarized as below: where ρ is a scaling factor. The enhanced moment invariant technique by Chen [21] is specified as follows:
On the other hand, based on Feng [20], GMI introduced by Hu [16] have numerous disadvantages and only invariant with an equal scaling image. Therefore, Feng [20] proposed aspect invariant moment (AIM) for imageries of inadequate scaling size by integrating the ideas of moment invariants that are efficient in solving different scaling in both directions of x and y. The invariant scaling that was proposed by Feng [20] is called aspect scaling invariant (ASI), and is given as: However, in this study, we used the integration of ASI and UMI (AUMI) to extract the features as given below. The detail of the integration is described in Muda [22].

The proposed biometric invariant discretization cutting point (BIDCP) for authorship invarianceness
As mentioned previous, feature extraction and learning scheme play significant role in determining and identifying the performance of handwritten authorship. Many approaches have been conducted in extracting and selecting the meaningful features. However, the issues of identifying the behavioural structures that are optimum or less-optimum are still infancy. In this study, the concept of Discretization is proposed to granularly mining the extracted invariants' features for better individual representation in cursive writer identification. It is used as an important role in leading to better identification for WI. Based on previous studies, classification approaches that work the best for pre-processing process are the one that integrated with discretization [23]. It discretized globally all the features of the writers. In other words, the continuous extracted features are discretized to attain the uniqueness of authors' individuality for better data representation [24]. Hence, the proposed discetization so-called biometric invariant discretization cutting point (BIDCP) is applied to the class information that is assigned to each writer to assure the distinctiveness and individual personality perseverance. Interval and representation features are formed based on each writer. If the features of two different writers are quite closed to each other or the values are the same, then comparable intervals for these two groups are generated. The novel approach here transforms those feature vectors into better behavioural representation without changing any characteristics. BIDCP first compute the feasible intervals for the given datasets. The minimum ( ) and the maximum ( ) of the features vectors ( ) for a writer are obtained. A cutting point of feature vectors that starts from the minimum ( ) and ends with the maximum ( ). In this study, The Interval is used to define the cutting points for the representation value of each writer. For this, we denote that the interval as the width of the bin as calculated in (7).
The entire bins is created equivalent to the overall feature vectors that represent one word image. Therefore, the entire invariant vectors in moment invariant function are preserved to its actual features. Each bin is then approximated with upper and lower values as demonstrated in (8) and (9). Each feature vector that falls in the interval of upper to lower approximation range is defined with a single representation value ( ), as illustrated in (10) which is the improvised version from previous Azah's Discretization [22]. Instead of taking the range between the interval, the ( ) is considered by taking the midpoint of the upper AV and lower AV . is not included in the first until eighth bin of a writer because it is used as base value to construct new approximation value for next bin. This range specifies boundary to each word written by the same writer. If there are two different words written by same writer that have close or same invariant features that fall within this range, hence there will be the same representation value for these two words of the same writer. This is because the values are calculated based on each writer. Therefore, the intention of the proposed algorithm is not to change the usual characteristic of writer but just to symbolize the original invariant behavioural feature into better feature representation. However, for the last interval, it is defined by the representation value ( ) of features vectors in the range of ( ) class. If the features fall within this range, they are symbolized as the writer's from the same class. Otherwise, it is considered as features from other class. Overall, feature values that fall within this both ranges are known as discretized features. With this improved procedure in computing intervals as illustrated in (10), the estimated representation feature values are more close to the actual biometric behavioural features distribution (true feature values). This preserves the discriminative power of the original features and enhances the statistical distinctiveness between individuals. Figure 1 illustrates the discretization process for writer 1. Each word image is represented as vector of nine discretized biometric invariant behavioural features.
From Figure 1, it states that each individual has its own unique representation features, which denote the main characteristic of each writer accordingly. This delineates the concept of individuality and discriminability assurance where each person has its own handwriting style. To further validate the effectiveness of the proposed Discretization, the individual obtained features are tested with author invariance analysis to evaluate the concept of individuality in handwriting.

RESULTS AND ANALYSIS 3.1. Analysis of cursive handwriting invarianceness
As mentioned previous, feature extraction and learning schemes play significant role in determining the invarianceness of individual in the perspective of moment functions. They can be signified as images perseverance irrespective to its transformations. Mathematically, Tomas Suk and Flusser [25] describe invariant I as a functional features on the space of all permissible image functions which will not modify its value that below deficiency operator D , which fulfils the condition of ( ) ( ) ( ) for related image function f , which known as invariance. Another appropriate operator I , as significant as invariance, is discriminability. For substances belong to another classes, I need to have significantly different values.
Therefore, in our study, we define authorship invarianceness in WI as low similarity deviation for same writer (called as intra writer class) and high similarity deviation for different writers (called as inter writer class) depending on wrting shape. This is due to the distinctiveness of each person writing style which is called as authorship invarianceness. The essential process of individual identification in WI is to look for comparable characteristic of wrting style based on the nearest unidentified individual wrting style in the record. This be able to solve by applying handwriting distinctiveness, and this can be achieved by conducting the computation of intra writer class and inter writer class. The objective of intra writer class and Note that the similarity deviation for Inter Writer Class (different writers) should be higher than Intra Writer Class (same writer) in the concept of authorship invarianceness. These similarity deviations represent the data by discerning the individual features into category. The fundamental idea is to obtain objects that can be easily categorized into one of the interval; hence this is labelled as discretisation approach. These similarity errors can be allied into discretisation to exemplify the data by discriminating the individual features into appropriate class. To the best of our knowledge, no study has been conducted on implementing discretization process in Writer Identification for cursive handwriting. Therefore, we propose this approach in our research study for authorship invarianceness. Tables 1 to 4 show the similarity result using MAE for intra-class (same writer) and inter-class (different writers) on Chinese characters and signature. Tables 1 and 2 illustrate the authorship for Intra Writer Class (same writer) on Chinese character, which is smaller compared to Inter Writer Class (different writers) for the similar word. Same results goes for inter writer class on dissimilar words like , , and where its similarities result is greater than intra writer class in authorship invarianceness. Interestingly, same results is found in longer characters like signatures.  Tables 3 and 4 show the result of MAE after the proposed Discretization on same and different signature characters respectively. Again, it is proven that the writer writing style for signature, where MAE value for intra writer class is lower compared to inter write r class, regardless of simple or complex word like or . This is due to the competence of Discretisation in mining the features class without any complications in terms of dimension and difficulties of the handwriting shape. Thus, this authorship invarianceness analysis confirms this novel approach is able to extract the unique behaviour features for individual identification.

Experimental results
This section investigates the improvement of identification performance based on handwriting using the proposed discretization by utilizing the dissimilar types of discretization methods, determined on a variety of classification methods. The comparison results are examined by using Chinese characters and signature biometric modalities from In-house multimodal biometric database. The experiment is tested on hundred subjects, where each subject contributes four samples of different style of Chinese handwritten characters and signatures. Numerous type of words are extracted, using integrated moment functions to signify the word in terms of feature vector. Since we are using moment functions images are not necessarily to be converted to binary representation. These feature vectors have gone through the discretization process prior to classification. Since the major contribution of our study is focus on our proposed discretization, hence, our comparisons are bounded to existing built-in discretization methods in rough set tool for data analysis (ROSETTA).
We are concentrating more on the effectiveness of the discretization mechanism in mining the granularity of the extracted features using moment functions. The experiments are conducted to assess the identification performance by performing different discretization methods like CAIM, CACC, ChiM, Chi2, ExtChi2, and Khiops discretization as well as classification methods of ROSETTA. These include Johnson algorithm, Holte IR Algorithm, genetic algorithm and exhaustive algorithm. Meanwhile, Biometric invariant discretization cutting point (BIDCP) is the proposed Discretization algorithm. 100 writers with 800 various handwritten Chinese character and signature images are extracted for each moment function as adopted in this study. The number of data for each word is different for each writer in our house biometric database. Therefore, different numbers of data can be prepared for each type of word in feature extraction task for each writer. About 7,200 invariant feature vectors of each technique are divided into training and testing data set in the identification task. The results of the experiments for data set using six discretization and four classification methods are summarized and reported into a single Table 5.  Table 5 shows the four types of classification methods with application of the proposed BIDCP discretization that gives the best response to the available biometric modalities. As it can be seen here, the implementation of proposed discretization on data set yields a higher average accuracy rate (over 98.5%) than other six discretization algorithms namely CAIM, CACC, ChiM, Chi2, ExtChi2, and Khiops respectively. The combination of proposed discretization with K-NN, C-45, NB and SVM classification on training datasets successfully achieved the best performance with the average accuracy rate of 99.213%, 99.500%, 98.555%, and 99.192% respectively. Whereas, for testing dataset, the performance of K-NN, C-45, NB and SVM classification also yields a higher performance with the average accuracy of 99.700%, 98.920%, 98.999%, and 98.859% after applying the proposed approach on the data sets. Meanwhile, the second best on the combination of CAIM and four classification methods on biometric datasets, while the worst for the CACC method. It clearly shows that our BIDCP discretized features give higher identification rates for all samples.
From these experiments, we found that the identification rates using discretized features are significantly greater compared to non-discretized features (original features). This is due to the features invarianceness and features discriminability that has been improved using our proposed discretization algorithm as well as other discretized methods. The features are assembled explicitly in same class (interval) and corresponding to the same individual with similar representation value. This representation value portrays the uniqueness of each writer respectively. This leads to lower variation for features in intra-class concept, and higher variation in inter-class concept. Hence, Authorship invarianceness and Author discriminability have been presented accordingly with better handwriting individuality.

CONCLUSION AND FUTURE WORK
In this studies, we proposed a novel method of presenting features discriminability by implementing discretization process prior to classification phase. The main goal of proposing discretization approach in the area of pattern recognition framework is to represent features in a granular form to obtain better individual representation. Our discretized data shows the characteristics of individuality in handwriting are well represented. The similarities of the same writers are also minimized between features, thus, leads to better identification accuracy. We have presented the findings of our generalized features for handwriting individuality using moment function. In future work, we will further mining these generalized features for better identification in forensic document analysis.