http://ijece.iaescore.com Investigation of robust gait recognition for different appearances and camera view

Info 2021 A gait recognition framework is proposed to tackle the challenge of unknown camera view angles as well as appearance changes in gait recognition. In the framework, camera view angles are firstly identified before gait recognition. Two compact images, gait energy image (GEI) and gait modified Gaussian image (GMGI), are used as the base gait feature images. Histogram of oriented gradients (HOG) is applied to the base gait feature images to generate feature descriptors, and then a final feature map after principal component analysis (PCA) operations on the descriptors are used to train support vector machine (SVM) models for individuals. A set of experiments are conducted on CASIA gait database B to investigate how appearance changes and unknown view angles affect the gait recognition accuracy under the proposed framework. The experimental results have shown that the framework is robust in dealing with unknown camera view angles, as well as appearance changes in gait recognition. In the unknown view angle testing, the recognition accuracy matches that of identical view angle testing in gait recognition. The proposed framework is specifically applicable in personal identification by gait in a small company/organization, where unintrusive personal identification is 3982 Each experiment tested by three sub experiments based on the number of training datasets or gallery. The first sub experiment (E1) trained models with only one normal walking sequence (nm-01) and tested by another five normal walking (nm-02 to nm-06), two carrying a bag (bg-01 and bg-02) and two wearing a coat (cl-01 and cl-02) sequences. The second sub experiment (E2) trained models with the first four normal walking sequences (nm-01 to nm-04) and tested by two sequences from each appearance (nm-05 and nm-06, bg-01 and bg-02 and cl-01 and cl-02). And the third sub experiment (E3) trained models with one sequence from each appearance (nm-01, bg-01 and cl-01) and tested by five normal walking (nm02 to nm-06), one carrying a bag (bg-02) and one wearing a coat (cl-02)


INTRODUCTION
Gait recognition, which is non-intrusive in identifying individuals in distance, is still challenging in biometric research. The main advantage of gait recognition is that it allows low-resolution images in use, works for long-distance detection and has non-interference with target activities. Moreover, gait as the person walking characteristic is hardly hidden or spoofing.
The major challenge in gait recognition is the performance issue with regards to uncertainty of conditions when gait images are taken, such as light emission (day/night), environment (indoor/outdoor), seasoning (summer/winter), walking surface (grass/concrete, flat/slope, dry/wet), projection (straight/curve), clothing (coat/skirt/jeans/shorts), shoe types (flip-flop/sandals/boots), carrying object (briefcase/bag/ backpack/staff) and camera view angles (frontal/side). In this study, we target different camera view angles and appearance changes that affect gait recognition performance. Many gait databases are available for supporting gait recognition research in various proposes, for example, CASIA gait database, SOTON gait database, USF Human ID dataset and OU-ISIR biometric database. Each dataset may provide different resources including gait video sequence, gait images sequence, gait energy images (GEI), and sensors information. Therefore, preprocessing the provided resource is need to convert the original data into suitable input or gait representation for each study.
Gait representation containing personal gait information is one of the most important parts of gait recognition research. There are two main stages in a typical gait recognition process, i.e. gait feature extraction and classification. Gait features representing the walking characteristics can be extracted from both gait model and gait image sequence. In a model-free approach, gait features are usually extracted from gait representation called a compact image which is generated from a complete gait cycle. The basic compact image, called gait energy image (GEI) [1] or average silhouette [2] is generated by an average function. GEI has been commonly used in model-free gait recognition research because of its simplicity and time-efficiency. Nonetheless, other gait compact images have consequently been developed to fulfil recognition performance such as gait entropy image (GEnI), gait Gaussian image (GGI) [3], flow histogram energy image (FHEI) [4], gradient histogram Gaussian image (GHGI) [5] and gait information image (GII) [6]. Various feature extraction techniques have been used with compact images, such as principal component analysis (PCA) [7], [8], linear discriminant analysis (LDA) [9], [10] and convolutional neural network (CNN) [11]- [14]. The second stage is classification. Various classifiers are attempted in gait recognition, such as nearest neighbor (NN) [3], [15], [16], support vector machine (SVM) [17]- [19] and CNN.
This research aims to solve problems suitable for gait recognition for a small company. It can collect employee videos from various camera view angles to train models of each employee. It may use only one camera to capture the employee at the entrance to each working area. An individual can freely walk from any direction passing through each camera. Therefore, the developed approach must recognize each person from unknown view angles with the provided model.
To achieve the goal, this study has firstly chosen GEI and gait modified Gaussian image (GMGI) as basic gait representation images. Next, histogram of oriented gradients (HOG) is applied to the basic gait representation images to generate an image descriptor, and PCA is chosen for dimension reduction to generate the final features used to represent individuals. Finally, one-against-all multi-class SVM is used as the classifier. CASIA dataset B [20] which contains three appearances and eleven camera view angles is chosen in training and testing. It aims to investigate the performance of proposed gait recognition. The rest of the paper is organized as follows. Section 2 presents the methodology for the gait recognition system. Section 3 discusses the experiments and results. The conclusion is given in section 4.

RESEARCH METHOD
This research focuses on two common gait challenges including different camera view angles and appearance changes because of wearing a coat or carrying a bag. The proposed gait recognition framework includes two parts, i.e., view classification and gait recognition, as shown in Figure 1. View classification refers to camera view angle identification while gait recognition works on personal identification. The issue of appearance changes is dealt with in both parts, which are designed under the same methodology including gait representation image generator, HOG feature description, PCA feature extraction and training/predicting support vector machine (SVM) models.

Gait representation image
The gait representation generator uses a set of silhouette images X1…Xn as the input. A compact gait image G is generated as output by a selected function. This research studies two selected functions including average and Gaussian function. The details of the process are shown in Figure 2 as an example.

Gait energy image (GEI)
GEI is a common model-free gait representation. The average silhouette image, calculated by averaging all binarized silhouette images from a complete gait cycle as it can be seen in Figure 2, is the basic gait representation as a greyscale image. This technique makes GEI robust to noise and efficient for memory space. GEI is defined as (1).
where N is the number of silhouette frames in the walking sequence, t is the frame number in the walking sequence, ( , ) is the intensity value at the pixel coordinate ( , ), and ( , ) is the density value at pixel coordinate ( , ) in frame t [1].

Gait modified Gaussian image (GMGI)
GMGI is generated based on the Gaussian membership function. Some gait research ever used the Gaussian membership function to generate a gait representation image called gait Gaussian image [21]. It is defined as (2).
Nonetheless, this study intends to change the data distribution range. Preliminary research was conducted for Gaussian parameter testing with view classification for example scaling factor or changing factor power. The preliminary results suggest using the standard deviation variable without its power. It is defined as (3).

Feature extraction
Feature extraction aims to prepare data for training personal models by SVM for prediction/identification. Two techniques including histogram of oriented gradients (HOG) and principal component analysis (PCA) are used for generating HOG features and reducing feature space dimensionality, respectively. First, several gait representation images are used as samples for HOG and PCA processes to create a feature map. Each input gait representation after the HOG process is projected to the feature map to generate a final gait feature. The details are explained below. HOG describing the information from the gait representation image has been implemented following the method in [22]. The first step is to compute the horizontal and vertical gradient values ℎ ( , ) and ( , ) by applying 1-D derivatives masks [−1,0,1] and [−1,0,1] . Next, each pixel magnitude and orientation are computed by; This research uses a grayscale input image, therefore, the magnitude and orientation have only one value per pixel. Then gradient histogram of each pixel has been discretized based on the number of bins .
Next, input has been divided into a non-overlapped region called cell. HOG description of a cell has been generated based on pixels in the cell and bin orientation. For example, cell size 2x2 with 9 bins histogram has 4x9 or 36 values. After that block description has been created based on the number of cells in each block. All cell descriptions have been concatenated into one vector. This vector is normalized by 2 as suggested. Finally, all block descriptions have been combined into one vector as output.
PCA finds the correlation between a group of HOG descriptions which are calculated from a group of training samples. Firstly, data matrix X is created from description vectors that are represented as a set of training input I. The average vector ̅ is consequently computed and subtracted from each column in matrix X. When 1 , 2 , … , ∈ I − I ̅ . Secondly, the covariance matrix C is computed from the training matrix X. X ̅ is the mean vector.
Next, the eigenvector matrix V and eigenvalue matrix D are computed from matrix C.
Finally, some eigenvectors are chosen as the principal component matrix P in order from the highest eigenvalues. Then, matrix P is normalized as the eigenvectors of the covariance matrix. After that, the feature map, matrix M, is created from the chosen eigenvector matrix P and data matrix X.
The feature map is applied with each input gait description after the HOG process (a projection process). The result has a much lower dimension than the initial input is used as features for training SVM models as well as predicting. There are two types of feature maps in this research. The view feature map referring to the most important features of all view angles by PCA is created only one map to separate each view angle from the other. And the personal feature map is created one map per one view angle. Because inputs from different view angles, the personal feature map based on each view angle may select a suitable feature descriptor for each view angle.

Model training and predicting
One-Against-All SVM which generates one binary model per class to separate the target class from the rest classes is used in training and predicting processes. Three kernel type has been used including linear, polynomial and sigmoid. The SVM classifier in this study is implemented as defined in [23].

RESULTS AND DISCUSSION
Four experiments are conducted based on the framework in Figure 1. They are: i) view classification, ii) identical view gait recognition, iii) cross-view gait recognition, and iv) robust view gait recognition. All experiments were tested on CASIA dataset B which captures gait sequence from 124 people in 11 different camera view angles as shown in Figure 4. Each person has captured ten videos per view angle, six for normal walking (nm-01 to nm-06), two for carrying a bag (bg-01 and bg-02) and the other two for wearing a coat (cl-01 and cl-02). This dataset provided three data formats including videos, silhouette sequence images and gait energy images (GEIs). Because GEI and GMGI must find the relationship between each frame therefore this experiment generated gait representation from provided silhouette image. Each experiment tested by three sub experiments based on the number of training datasets or gallery. The first sub experiment (E1) trained models with only one normal walking sequence (nm-01) and tested by another five normal walking (nm-02 to nm-06), two carrying a bag (bg-01 and bg-02) and two wearing a coat (cl-01 and cl-02) sequences. The second sub experiment (E2) trained models with the first four normal walking sequences (nm-01 to nm-04) and tested by two sequences from each appearance (nm-05 and nm-06, bg-01 and bg-02 and cl-01 and cl-02). And the third sub experiment (E3) trained models with one sequence from each appearance (nm-01, bg-01 and cl-01) and tested by five normal walking (nm02 to nm-06), one carrying a bag (bg-02) and one wearing a coat (cl-02) sequence.
Gait representation Image size was 120×120. Parameters for HOG were fixed as cell size 2×2, block size 3×3 and bin size 12. The result had been evaluated in term of Classification Correction Rate (CCR) which is the ratio of correctly recognized number to the total number of samples as (12).
SVM-kernel is also investigated in this research. There are three kernels including Linear (L), Polynomial (P) and Sigmoid (S).

View classification
The first part of the research framework was the view classification which identified the camera view angle from the input gait representation image. The results were shown in Table 1. Although only one normal walking sequence was used in model training in experiment E1, GMGI with SVM sigmoid kernel got the view angle classification of 99.18% for normal walking and 97.28% for mixed appearance. For the average view angle classification accuracy, the best was 98.98% with GMGI tested under SVM sigmoid kernel when models were trained by the mixed appearance sequences in experiment E3. The results also show that GMGI is explicitly better than GEI in view angle classification in the three experiments. SVMsigmoid is superior to SVM-Linear and SVM-polynomial in the classification. The comparison between our research and the CCR from the other publication is shown in Table 2. All publications used four normal walking sequences (nm-01 to nm-04) as a gallery, while two sequences from each appearance (nm-05 to nm-06, bg-01 to bg-02, and cl-01 to cl-02) as a probe. As it can be seen, GMGI with SVM sigmoid kernel has the highest average CCR (97.56%). Especially, it achieves the classification accuracy of 99.37% when tested with a normal walking probe.

Identical view gait recognition
This experiment, concerning the second part or gait recognition of the research framework, used gait datasets from the same camera view angle in training and testing phases. There were also three subexperiments conducted with the same setting as in the view classification. The results are shown in Table 3. When models were trained with only one or four normal walking sequences, GMGI had a very low accuracy rate comparing with GEI. This implies GEI was more robust with appearance change than GMGI in gait recognition. Nonetheless, GMGI had a similar accuracy rate with GEI when the probe sequence was the same appearance as the gallery sequence. The suitable SVM-kernel depended on the number of training sequences, for example, SVM-Polynomial had the highest result in E1. The highest accuracy rate was 96.71% by GMGI with SVM-Linear when the model was trained by the mixed appearance sequences followed by GEI with SVM-Sigmoid 96.67%. From the results of the mixed training gallery (E3), the main problem of GEI was the low accuracy rate when tested with a normal walking sequence. If the mixed training were with 2 normal walking +1 carrying a bag +1 wearing a coat sequences, the accuracy rates were improved to 97.39%-NM, 98.67%-BG, 98.59%-CL and 98.22%-AVG, whilst GMGI accuracy rate was achieved 98.98%-NM, 95.65%-BG, 98.12%-CL and 97.58%-AVG. The training datasets significantly contribute to the gait recognition performance for the proposed framework with CASIA dataset B. The comparison with the other publication results is shown in Tables 4-6. All publications used the same setting for a fair comparison. The first four normal walking sequences of all people are set as a gallery (nm-01 to nm-04), while the remaining two normal walking, two carrying a bag and two wearing a coat sequences are set as a probe. Because GEI with SVM sigmoid kernel has the best CCR in Table 3-E2, therefore, GEI and GMGI with SVM sigmoid kernel are selected in this comparison. Table 4 focuses on the normal walking sequence which set only nm-05 and nm-06 sequences as a probe. GEI decomposition method [26] has the highest average CCR 99.64% following by our GMGI with SVM Sigmoid has the second-highest average CCR of 99.14%. Table 5 focuses on gait recognition from the lateral view. Many published methods have very high CCR when tests by normal walking sequences. And the CCR dramatically decreases when tests by carrying a bag and wearing a coat sequence. Our GEI with SVM sigmoid has the highest CCR with carrying a bag sequence, however, it has lower CCR with wearing a coat sequence. Therefore, the average CCR of our GEI with Sigmoid (94.25%) is slightly lower than the GEI decomposition method (94.33%). Table 6 focuses on the average CCR from all camera view angle. Even, the proposed GEI with SVM Sigmoid only has the highest CCR in the case of carrying a bag sequence, it has the highest average CCR in both tables. GEI decomposition method has the highest average CCR in Table 5, nonetheless, our GEI with Sigmoid has a higher average CCR from all view sequences. This shows the superiority of the proposed framework.

Cross-view gait recognition
This experiment tried to recognize a person from every camera view angle with the trained personal model from only one view angle. It tests the robustness of the gait recognition framework. All testing sequences were applied with the personal feature map from the same view with trained personal models which may not be the actual camera view angle of testing gait representation. For example, probe sequences are tested with a personal model in 90˚, each probe input is applied with 90˚ personal feature map. The probe sequence is tested with all view angles. The results were shown in Table 7.
From Table 7, GEI is more robustness than GMGI on cross-view gait recognition. GMGI had very poor performance with cross-view gait recognition. It shows failure in the test. Surprisingly, GEI results from E2 were at similar levels with E3. GEI with SVM sigmoid kernel from E2 had the best performance with an average CCR of 69.09%. The results from E3 had a relatively low accuracy rate with normal walking testing but it is much more robust in wearing a coat testing.

Robust gait recognition under the proposed framework
The last experiment is designed to test the overall gait recognition framework with the two stages, view angle recognition followed by gait recognition with unknown gait images as a probe. First, the testing input was classified for its view angle. Next, personal recognition was done from the identified view angle with each person's gait model. From the E2 results in Table 1, the highest CCR was 97.56%. It implied that the identified view in the first step might not the actual view angle of each input. At the same time, the miss identified view input might correctly be recognized by cross-view gait recognition as shown in Table 8.
This experiment selected the gait representation image and SVM kernel based on the CCR in experiments E1 and E2. The results from the full gait recognition framework were compared with the identical view gait recognition. The summarized results were shown in Table 8. The difference between identical and unknown view angle inputs was less than one per cent in every case. This shows the robustness of the proposed framework to unknown camera view angles in gait recognition. It implies that the proposed  Table 9. All publications set four normal walking sequences (nm-01 to nm-04) as a gallery and the six-remaining sequence as a probe. Our framework has a lower average CCR when it is tested by normal walking sequences. It has a better average CCR when it is tested by carrying a bag and wearing a coat sequence. Therefore, our framework performs more than 92% with all appearance change sequences. It is robust to carrying object, clothing and camera view conditions.

CONCLUSION
In this paper, a gait recognition framework was presented to tackle the challenge of unknown camera view angles and appearance changes in gait recognition. In the framework, camera view angles are firstly identified before gait recognition. We used two types of compact images, gait energy image (GEI) and gait modified gaussian image (GMGI), as the base gait images. They are also the input of the framework. Histogram of oriented gradients (HOG) was applied to the base feature images to generate feature descriptors, and then a final feature map from PCA (principal component analysis) operations was used to train SVM models for individuals. We conducted a set of experiments on the CASIA gait dataset B and investigated how appearance changes and unknown view angles affect the gait recognition accuracy under conditions of different training datasets and different kernels in SVM. We discovered that without knowing the camera view angles the gait recognition rate was very low although GEI was better than GMGI in this case, it wouldn't be suitable for a real-world application. In general, GMGI is better than GEI in adapting appearance changes, especially when using a linear kernel SVM in classification. With the proposed framework, the experimental results have shown robustness in dealing with unknown camera view angles, as well as appearance changes in gait recognition. In the unknown view angle testing, the recognition accuracy matches that of identical view angle testing (training and testing using the same view angle images) in gait recognition. The proposed framework is specifically applicable in personal identification by gait in a small company/organization, where unintrusive personal identification is needed. Therefore, future work will consider the real-time application for gait recognition with a large dataset. HOG descriptor will work as a feature descriptor and the deep learning algorithm will be involved as a classifier.