Developing digital signal clustering method using local binary pattern histogram

Received Feb 25, 2020 Revised Aug 9, 2020 Accepted Aug 17, 2020 In this paper we presented a new approach to manipulate a digital signal in order to create a features array, which can be used as a signature to retrieve the signal. Each digital signal is associated with the local binary pattern (LBP) histogram; this histogram will be calculated based on LBP operator, then k-means clustering was used to generate the required features for each digital signal. The proposed method was implemented, tested and the obtained experimental results were analyzed. The results showed the flexibility and accuracy of the proposed method. Althoug different parameters of the digital signal were changed during implementation, the results obtained showed the robustness of the proposed method.


INTRODUCTION
Digital signals such as digital color images and digital wave signals are used usually in various computer applications such as computer security and others. Because of the big size of the wave file it is very difficult to use thr whole file for retrieval or recognition purposes; here the importance of extracting file features appears [1]. Digital wave signal usually represented by mono or stereo. Mono describes a system where all the audio signals are mixed together and routed through a single audio channel. Stereo sound systems have two independent audio channels, and the signals are reproduced by two channels separated by some distance [2]. The amplitude values of each column are ranges from -1 to +1 and they are the results of sampling and quantization of the voice signal. Figure 1 shows some samples of a given wave file.
While Figures 2 and 3 show the wave of the voice signal. One of the most used applications related to digital wave signals processing is voice retrieval and recognition. Most of these applications use the nature of the digital wave file to generate some features for the file by mean of calculating some parameters such as crest factor, dynamic range, mean of the normalized data (sigma), and standard deviation of the normalized data (Mu), these parameters can be easily calculated and used as a features for digital wave signal [1][2][3]. Calculating these statistical parameters requires understanding digital voice characteristics and nature, and some time they do not give an acceptable recognition ratio if we use them to recognize the voice even if they give stable and fixed features for each wave file. These features will remain the same even if we change sampling frequency, amplitude or phase shifting as shown in Tables 1 and 2.  To overcome the above mentioned disadvantages, we can extract the voice signal features based on local binary pattern (LBP). Here we can calculate LBP histogram to be used as an input data set to generate the digital file features. LBP and its variants such as completed noise-invariant local-structure pattern (CNLP) [4], and dominant LBP (DLBP) [5] has been favorably applied to a wide variety of applications, such as texture classification [6][7][8][9][10][11][12][13], face analysis [14][15][16], speech recognition [9,10] and others [17,19]. The LBP encodes the co-occurrence of neighboring pixel comparisons within a local area. It is computationally efficient, simple, and robust against some parameters changes. A cluster refers to a collection of data points combined together because of certain similarities. A centroid is the location representing the center of the cluster. K-means algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible [20][21][22][23][24][25].

PROPOSED METHOD
The proposed method can be implemented by applyig the following 2 phases: Phase 1: LBP histogram calculation.
This phase can be implemented performing the following steps: a. Get the digital wave file. b. Reshape the wave file into one row array. c. For each value in the row calculate LBP operator as shown in   Figure 5 shows the calculated LBP histogram of the duck wave file. Phase 2: K-means clustering Clustering means grouping the data values in the input data file into clusters (groups) [20][21][22][23][24][25], each cluster will have a center (centroid), set of values which are belong to and within a cluster sum(sum of the values belong to the cluster), one or more of these parameters can be used to form the data file features. Figure 6 shows how a data input set was grouped into 2 clusters: To perform the clustering phase we have to apply the following steps: 1) Get the LBP histogram of the digital voice signal. 2) Initialize the number of clusters and the centroid of each cluster.
3) While there are a changes in the calculated centroids do the following: a) Calculate the distances between each data set value and cluster centroid, which is equal to absolute value of the deference between the center and the data item value. b) Select the value nearest cluster, the minimum distance the minimum cluster number. c) Find the new centroids by averaging the values within the clusters. Worked example: The following example shows how to group the input data into 2 clusters with the following centroids initial values Tables 3 and 4: c1 =16 c2 =22 Figure 5. LBP histogram of duck wave file Figure 6. Grouping input data set into 2 clusters

RESULTS ANALYSIS
The proposed method was implemented using various digital wave files. Each time a LBP histogram was calculated and used for clustering, the main advantages of the proposed method is a flexibility, here we can use the centroids, or within clusters sums, or cluster points to create wave file features, also it is easy to adjust the number of clusters to expand the number of elements in the features array. The experimental results showed that the obtained features for each wave file are unique, thus they can be used as a key or signature to retrieve or recognize the wave file, and Table 5 shows the calculated features for some wave file samples. The proposed method was tested using the same wave file but with different sampling frequencies, Table 6 shows that the features for the wave file remain the same. Also the proposed method was tested using the same wave file but with different amplitudes, Table 7 shows that the features for the wave file remain the same.  Cow  243  183  93  8  Dog  227  164  102  60  Duck  231  176  112  71  Dolphin  218  158  73  32  Horse  237  167  79  7  Donkey  229  177  121  66  Elephant  255  249  114  15  Spock  234  131  66  10

CONCLUSION
A flexible, fixed, and accurate method of wave file features extraction was proposed and imelemented. The proposed method relies on LBP histogram. More than one parameter can be used to form the file features, and number of data items in the feature array can be easily adjustable. It was shown that the generated features for any wave file are unique, and they can be used as a signature to recognize the file. The signature is robust againist the change of sampling frequency and the file amplitude.