Face recognition for presence system by using residual networks-50 architecture

Received Aug 10, 2020 Revised Apr 16, 2021 Accepted Apr 26, 2021 Presence system is a system for recording the individual attendance in the company, school or institution. There are several types presence system, including the manually presence system using signatures, presence system using fingerprints and presence system using face recognition technology. Presence system using face recognition technology is one of presence system that implements biometric system in the process of recording attendance. In this research we used one of the convolutional neural network (CNN) architectures that won the imagenet large scale visual recognition competition (ILSVRC) in 2015, namely the Residual Networks-50 architecture (ResNet-50) for face recognition. Our contribution in this research is to determine effectiveness ResNet architecture with different configuration of hyperparameters. This hyperparameters includes the number of hidden layers, the number of units in the hidden layer, batch size, and learning rate. Because hyperparameter are selected based on how the experiments performed and the value of each hyperparameter affects the final result accuracy, so we try 22 configurations (experiments) to get the best accuracy. We conducted experiments to get the best model with an accuracy of 99%.


INTRODUCTION
Biometrics is a term used to determine the DNA of an individual, hand geometry, face or physical characteristics, such as signatures, sounds and so on. Biometric systems are generally used to authenticate and identify individuals by analyzing the individual's physical characteristics, such as fingerprints, irises, veins and others [1]. Biometric systems use unique physical characteristics of individuals which different from others to be identified and analyzed to achieve certain goals [2]. One of biometric system that also superior and can be applied to the attendance system based on the comparison made is using faces [3]. As identity information, human faces have the advantage of being unique and free of imitation [4]. In the face recognition system, the technology used is face detection which is the first step in facial recognition process and face recognition [5]. One of the supporting media that can be relied upon in the attendance system using face detection and face recognition is a real time video camera. Camera in the face detection system have the advantage of application flexibility, so they do not require users to make direct contact with the attendance system [6]. In this research, based on a case study of presence system, we conducted a study of the application on biometric systems by using deep learning. Deep learning is a type of machine learning method that makes computers learn from experience and knowledge without explicit programming and extract useful patterns from raw data [7]. With the presence system using deep learning for face detection and face recognition, it is expected that the process of recording student attendance is more efficient, as well as reducing fraud that might occur.
In develop the presence system there will be 2 stages, namely face detection and face recognition. In the face detection stage, the haar cascade classifier method is used to detect elements on the face, namely the eyes, nose and mouth [8]. In the face recognition stage, the convolutional neural network (CNN) algorithm is used for the process of recognizing and matching input data with data on the model. In our research we only use CNN which the values of the kernel are determined by training, while a haar-feature is manually determined. While well-trained CNN could learn more parameters (and thus detect a larger variety of faces), haar based classifiers run faster [9]. Haar cascade detect human faces enclosed by a square and give center points of face elements (eyes, nose, and mouth) [10]. The haar cascade classifier is also called the Viola-Jones Method, which is the most widely used method for detecting objects. The application of human faces detection by using haar cascade classifier can be carried out to get a comprehensive result such as for detect human faces on thermal image [11].
Among all deep learning structure, CNN and recurrent neural networks (RNN) are the most popular structures [12]. As state above we use CNN because we want to get the best accuracy and CNN has been proven to be very effective in areas such as facial recognition and classification compared to another method [13]. Also, CNN extracts features automatically, so there is no need to select features manually [14]. There are already some researches in face recognition based on CNN, some of them implemented augmented reality to compared it with face database and give high accuracy [15], and many of them implementing softmax architecture for facial recognition which already proven give a good accuracy [16]. In here we propose to use ResNet-50 architecture for recognition, because has good performance if compared to simple CNN [17]. Residual networks (ResNet) are a convolutional network that is trained on more than 1 million images from the ImageNet database and for ResNet-50 the total number of weighted layers is 50, with 23534592 parameters that can be trained [18].
Our contribution in this research is to determine effectiveness ResNet architecture with different configuration of hyperparameters. This hyperparameters includes the number of hidden layers, the number of units in the hidden layer, batch size, and learning rate [19]. Because hyperparameter are selected based on experiments performed and the value of each hyperparameter affects the final result accuracy, so we try 22 configurations (experiments) to get the best accuracy [20]. We also want to prove that learning rate has a large influence on accuracy among hyperparameters in here [21].

RESEARCH METHOD
The explanation of the data processing design scheme is as follows as shown in Figure 1: − Data in the form of face images are used as input to be processed. − Each person images are capture with 15 different position and expression in RGB color space and JPG format. − Pre processing stage is the stage of image data uniformity consisting of uniforming the size of the image and image augmentation. − In here we make each image has same specification, such as its color space and resolution. − Classification, namely the stage of recognizing faces with several stages consisting of convolutional layers, pooling layers, flatten, fully connected layers and softmax. − In here we use CNN with a certain number of layers. The number of layers that contribute to a model of data is called the depth of a model [22]. In every stage we do several hyperparameter configuration for every experiment that we conduct until we found the best combination and get the best accuracy. The detail about the hyperparameter configuration could be seen in the explanation subsection about the model experiment design. − The output of processed data is information from the data that has been identified. − In here we try to get the result wether the recognition system could recognize the person correctly or not.
We do the experiment with 9 different persons, each of them will be identified by the system that we build, from that we could know how robust our system to recognize the face of each person.

Preprocessing
Preprocess data on a convolutional neural network has several stages, the first stage is Image Scaling at this stage, the input data will be equal in size. This stage is needed because the available image size does not always match the image size specified as a dataset. After that we continue with Augmentation process, at this stage, augmentation consists of 3 stages: Giving a blur effect to the image, giving a random noise effect (noise) and adding light intensity to the image. The purpose of this augmentation is to uniform image data in order to simplify the classification process.
We also do some geometrical operation such as flip, shift (translation), rotation, and segmentation. Flip consists of 3 stages: Data changing the image to horizontal, vertical and rotated horizontally and vertically. The purpose of this flip is to reproduce data to simplify the classification process. At Shift stage, we translate/shifts the location of the object from the original object in the data. Then in the rotation stage, the image will be shifted counterclockwise according to a predetermined angle. Last is the segmentation stage, segmentation is used to detect the edges of faces to get an image from an image.

Classification
The explanation of the scheme in Figure 2 is as follows: − Input data will enter the convolution layer, which is the process of manipulating images to produce a new image to be entered at a later stage. We use zero padding in convolutional process [24] as shown in Figure 3. − Pooling layer at this stage is to do calculations on each pixel of the image feature that has been converted in here into a matrix. The goal is to divide an image into several features to make it easier to do an image match. − Flatten is the stage where the features produced at the pooling layer with a matrix size n x m will be provided that n> 1 and m> 1 will be converted to the order matrix 1x1.

5491
− Fully connected layer which is the stage of producing output in the form of the probability of an image that will be used in the classification process of output data. − Softmax is the stage of calculating probabilities on all labels in the data. − The final result of this process is the value of the softmax calculation, which is the probability of each label in the data.

Model experiment design
We divide the data that has been preprocessed into a data train and data test, with a share of 80% for the data train and 20% for the test data. 1050 data were divided into 2 parts, 840 data to be data train and 210 data to be test data. Data that has been divided into data train and data test will go through the training model stage using the convolutional neural network algorithm. In the process of training models, for the first experiment we use hyperparameter that has learning rate with value 0.1, epoch with value of 10 and step per epoch with value of 100. In data processing using convolutional neural network, the data goes through several stages, namely pooling layer, flatten, fully connected layer, softmax calculation to produce a model. The resulting model will be evaluated to find the accuracy of a model. The experimental design for this model can be seen in Figure 4.

RESULTS AND DISCUSSION
In this section, it is explained the results of research and at the same time is given the comprehensive discussion. The results obtained and discussion of the implementation of attendance system development using face recognition technology are as follows:

Result of data collections
Each person dataset must consist of 15 image data with different conditions, we use this based on design of smart door system for live face recognition based on image processing [25] and we do not use augmented reality database for comparison. We collected images of each individual faces by taking each individualfaces with different variations and then developing image collection techniques with preprocess data methods to homogenize the whole picture. The dataset sample could be seen in Table 1. The face forms tilted to the right 45 °, and smiles.
Face shape facing up, eyes closed and expressionless.
The shape of the face is tilted to the right 45 °, and without expression.
Face shape facing upwards of 45 °, and smiling expressions.
The shape of the face is tilted to the left 45 °, and expressionless.
Face shape facing up to 45 °, eyes facing up and without expression.
Face shape facing right 45 ° and smiling expression.
Face shape facing left 45 °, and smiling expressions.
The shape of the face is tilted to the left 45 °, and the expression smiles.
Face shape facing up with eyes staring straight at the camera and smiling expression.
Face up with eyes closed and without expression.

Result of preprocessing data
In this phase the initial data that already collected has different sizes and not uniform, so the preprocessing stage was needed in this research. We include the function of image normalization to ensure the uniformity in image size and augmentation. The tests that carried out in the preprocessing stage use 15 variations of image data for 1 class. Following is an example of the test results for one image data that has through the preprocessing phase that can be seen in Table 2. At the preprocess phase, one image data produces 87 preprocess data. So, for 1 class that contains 15 variations of data, the total data generated after preprocessing is 87x15=1,305 data. From 53 classes collected, preprocess data will be obtained, that is 53 classes x 1,305 data = 69,165 preprocess data.

Result of model testing
In this phase we conducted an experiment with data sharing which is 80:20, with lots of data 1050 with 840 for data train and 210 for test data. We also conducted an experiment by using 13050 data with 10440 for data train and 2610 for test data. The number of classes modeled is 10 classes, this is due to limitations on inadequate support resources for modeling 53 classes. The result of model testing with different configuration of hyperparameter can be seen in Table 3. From the results of the modeling experiments above, it can be concluded that the 22 nd experiment has the best accuracy with 99% data train accuracy and 99% data test accuracy. This is because we has experimented with hyperparameters and obtained the right hyperparameters to build a model with value of data train accuracy and data test accuracy that reached 99%.

Result of prototype presence system
The results of this implementation are the prototype that built using a graphical user interface (GUI) provided by Python 3.6 and tkinter which is successfully built to recognize the face from each student by using previously trained model with 22 nd experiment hyperparameter configuration as shown in Figure 5.

Result of prototype presence system
After implementing data modeling, we tests the results of the models that have been built. To get the results of evaluations that can be compared and concluded the results, we conducted an object experiment on 9 students. Each student did 5 object experiments. The result of object testing can be seen in Table 4.

CONCLUSION
The conclusion obtained from the research that already conducted is the presence system was develop in the form of a prototype using the convolutional neural network (CNN) algorithm by conducting trial experiments on hyperparameters such as the learning rate with a value of 0.0001, epoch with a value of 100, and step per epoch with a value of 150 so this hyperparameter configuration give us a model with accuracy of 99 %. After that we built a presence system prototype by using a graphical user interface (GUI) which is provided by python named tkinter, then we applied the model that has been obtained into the prototype so that the presence system prototype can be used to predict the facial image.