Comparing Mask R-CNN backbone architectures for human detection using thermal imaging

Tan Dat Trinh, Pham Cung Le Thien Vu, Pham The Bao


We introduce a method for detecting humans in thermal imaging using an end-to-end deep learning model. Our objective is to optimize the human detection process in thermal imaging by investigating the mask region-based convolutional neural network (Mask R-CNN). The model, an advancement of the faster region-based convolutional neural network (Faster R-CNN), not only captures bounding boxes encompassing human subjects but also delineates segmentation masks around them. Our investigation extends to the evaluation and comparison of various convolutional neural networks for feature learning, like residual network (ResNet) and Inception ResNet, all integrated into the Mask R-CNN framework. Furthermore, the experimental results show that our proposed technique achieves high accuracy. Specifically, the Mask R-CNN model using ResNet50-V1 achieved the best results, with an F-value of 87.85%, a recall of 79.33%, and a precision of 98.41%.


Convolutional neural network; Deep learning; Faster R-CNN; Human detection; Mask R-CNN; Thermal image

Full Text:



Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).