Human activity recognition by using convolutional neural network

ABSTRACT


INTRODUCTION
The HAR system, a widely used pattern recognition system [1][2][3], can be divided into several modules such as sensing, feature extraction classification, segmentation and post-processing [4]. HAR systems can be categorized into two types: time-based and acceleration-based. Acceleration-based methods require multiple accelerometers to be used for data collection, but time-based methods typically require the use of one or more cameras to collect data. The disadvantage of the acceleration method is that it can cause discomfort to the human body when performing activities such as walking, running, and lying down.
However, the various human activities to be monitored in this study include hand waving, punching, kicking, lying down, walking, running, and standing. The advantage of a vision-based system is that the sensor works without sticking to the body. However, recognition performance depends on lighting conditions, viewing angle, and other factors. In this paper, we propose a system that uses a time-based data set [5][6][7][8] captured by a thermal camera [9,10] and a CNN structure [11][12][13][14] to solve this problem. This can reduce the procedures of the handicraft process and increase the accuracy.

PROPOSED METHOD AND FEATURE EXTRACTION
Explaining This section introduces the proposed method and feature extraction. First, it explains how to perform the function extraction step-by-step. The first step is to cut the original image by hand. This is because the shape of the image is cut to some shape. The next step is to perform background subtraction based on the ROI coordinates between the background images. CNN [15] Sets the final binary image size, 200 × 200 image, for the input image, to a fixed size. The threshold is then used to obtain the binary image. The threshold is defined as 50 points. At the end of this process, a morphological operation is used and a GEI image of a binary image is obtained. Figure 1 also shows the system architecture for us to understand the system easier. In Figure 1, our main system architecture is that we preprocess the user section and cut some work into handicrafts. The application section is our main proposed method used by the CNN (VGG16-Net) architecture [16][17][18], and part of the training is to use our model. We use the Keras model and test it as a part of the test. Our dataset is by a trained model. And our recognition system is illustrated in the flow chart of Figure 2. Our feature image is shown in Figure 3.

EXPERIMENTAL RESULT
In this section, we briefly describe our databases, comparison and accuracy. First, it's about database 1, collected in multiple environments with objects that are different from people images taken with a thermal camera in a dark environment. So, as mentioned earlier, we have combined all databases with database 1. There are six different people and topics caught in different environments. There is a sample of the database image shown in Figure 4. We can visualize the database description in more detail in Table 1. -outside next to road -the man is running Table 2 shows the number of images in our database. Table 3 shows the numbers of images and the types of motion in each dataset. The source database represents a three-channel feature image. The augmented database displays artificially augmented feature images. It then divides into one or two sets of data and the original augmented database. For example, an augmented data set was used to study the VGG-16Net [19,20] and the original data set was used to test the VGG-16Net. In addition, we use the inverse to estimate double cross validation. A description of the HAR system database sample is given above.  Second, we introduce the comparison and accuracy. In our study we use the CNN method in HAR systems to improve these problems. CNN allows some features that you do not need to track to detect human leg or hand location information. You can also recognize many activities. Because if we train the data set ready for the inputs of the module, many functions will be recognized as we expected. Table 4 shows the pros and cons between the previous method and our method. In addition, we tested our method and constructed it to estimate double-cross validation in Figures 5-6.
In Figures 5 and 6, we trained ten epochs and it shows the good accuracy end of the last epoch. In NN training, a stroke represents one complete step through a given set of data. The upper direction of the figure is the accuracy, the lower left accuracy is 0 ~ 1 and the number of epoch is 0 ~ 10. Accuracy values are between 0 and 1. Measure 0 means that the feature values match and does not recognize the input image. Measured value 1 indicates the function matching value, and the corresponding compensation value is displayed as true. From 0 to 10 for epoch we have trained 10 sets of data. After double-crossing training, our data set is ready for testing. Table 5 shows the summary of comparisons and accuracies.

CONCLUSION
Recently, many researchers have studied on the activity recognition [21][22][23][24][25]. There are several types of HAR systems. For example, HAR system can be used for sport activities, daily life activities in the hospital use, patient monitoring after surgery, care for elder people and etc. Intelligent surveillance systems are recognized when a patient falls without people around him.
In this study, however, we proposed this approach to solve the above mentioned problems. Human activity has several unique characteristics that do not require a subject. In this study, human motion using CNN is analyzed and thermography -based human activity recognition is used. The main problem was to resolve the perception of daytime and nighttime human activities in this study that could not recognize previous studies. Vision-based awareness is useful for activity recognition systems. However, the thermal camera is operating during the day and night described above. We can see many studies that have been researched and developed by methods used in HAR systems. Our results were very good and worked well after we used the CNN method in the HAR system. The result is shown as 95.9%, and is more recognizable than the other methods.
We plan to increase the number of activities for the experiment in the future. We will also add more camera types that will allow us to increase our data sets in other dimensions. We will also be presenting research on people who own personal items such as their wallets, bags and cell phones. We also plan to study the accuracy and layout of real-time HAR systems.