Simplified video surveillance framework for dynamic object detection under challenging environment

ABSTRACT


INTRODUCTION
Nowadays, security systems for residential, banks, malls, etc., is necessary due to increased crime rate around us. Thus, video surveillance is widely considered in recent past which offers video footages, records to find out the criminal or unusual activity in that particular area. Recent advancement in computer vision tools such as sensor technology, storage capacity and high resolution displays have made superior progress in video surveillance systems. This advancement in surveillance systems incorporates a video analytics mechanism that meets the specific classification of objects from the image sequences based on user's requirements [1]. Video analytics refers to the ability of autonomous understanding of objects or relevant events from the visual data frame for different purposes without any human involvement. Furthermore, it is collaboration of computer vision, imaging technique, machine intelligence and analytics concept which detects the temporal and spatial events of visual data streams and generates specific and relevant patterns [2]. The potential of video analytics application can be realize through that it became an essential part of the various organizations such as medical, government and private sectors [3]. It plays very effective role in the field of surveillance for security, crime investigation, retail, market, military, automobiles, traffic controlling, etc. with wide scope of object detection, objection recognition, object tracking and event extraction from the image as well as video scenes [4]. The detection and recognition process in video analytics involves several procedures from image processing level to interpretation of the video scene which is impossible for the human operators to manually execute this process [5]. The observation, feature extraction and classification of events uses various advance techniques such as machine learning, deep learning, neural networks and data analytics used for the feature classification and knowledge discovery. Moreover, it has been seen that many existing researches have carried towards in the domain of video analytics with their advantages and limitations [6][7][8]. Also, various and different existing work of video analytics for object and event detection are discussed in review of literature section. However, one side video analytics opens great opportunities and promises for the individual organizations in terms of higher security, authentication, loss prevention and higher business value [9]. On other hand side it faces some typical challenges that it is face data management problem, inefficient for detecting complex events such as face detection in crowed, detection of moving objects in real-time, event detection in different weather conditions and environments, etc [10]. So there is an immense requirement of such effective tools and mechanism that researches have to introduce with the explosive growing of these real data's and smart surveillance systems in order provide full security and business gain. Therefore, the present paper introduces "a model that uses simplified approach using motion blob and image depth in order to solve the problem of dynamic subject identification". Section 2 discusses about algorithm implementation followed by discussion of result analysis in Section 3. Finally, the conclusive remarks are provided in Section 4.
This section discusses about the existing research works that carried in the division of video data analytics. The previous works of Madhu Chandra and Reddy presented a) conceptual as well as research overview on video analytical modeling [11], b) an analytical framework [12] to fulfill the research gap found in [11] by using dictionary based approach and unsupervised learning (UsL) approach and c) in [13] multivariate analysis and UsL is used to identify the contextual outlier.
The work of prabhakaran et al. [8] have, discussed both potential and issues of video analytics in various aspects. In the study of [14] Aslam and curry have applied deep learning technique for event detection for the real-time data generated from the multimedia's internet of things. The work carried out by Ballas et al. [15] have investigated the capabilities of video processing based on frame of IoT infrastructure by using light gateway nodes for crowd monitoring and event detection. Imbalance data classification always seems as challenging issue in the data analytics. So, Yang et al. [16] have designed a efficient framework for multimedia analysis. The presented framework uses statistical data analysis algorithm for feature selection and classification.
Shao et al. [17] have proposed an intelligent approach for event detection from surveillance system. In this author have utilizes smart monitoring system that have availability to generate alarm for abnormal events and also have maximum storage capability, higher information retrieval property. In [18] Dimitrious et al. have presented modular surveillance model based on embedded computing technique for efficient video analytics solution for detecting minor crimes. Ma et al. [19] have presented optimal storage approach based on key-indexing to extract the useful content in efficient way. The experimental outcome of presented study shows that it achieves good performance for extortion of information from surveillance video content. Brinton et al. [20] have studied the performance of student behavior of watching-video and formulated a novel event detection framework based on sequences of number of event round and behavior of positions.
Pham et al. [21] have presented a novel model for mitigating the problem occurs in background pose detection in surveillance systems. Based on theoretical analysis of local changes and wind noise, the author have constructed event detection framework which utilizes optimization algorithm to reduce the complexity occurs in event detecting. In [22], the author has used combined approach of swarm intelligence and histogram of oriented gradients for event detecting in the crowded surroundings. The work carried out by Cheng et al. [23], have developed a visual analytics supportive analytics for event detection in surveillance systems. Similarly the Abdullah et al. [24] have used cloud technology and GPU cluster for detecting traffic pattern from the recorded video streams. Meghdadi et al. [25] have designed a video analytics framework for detecting event in moving videos. The only limitation of the presented approach is that it is inefficient to detect events in crowded activity.
Senst et al. [26] have developed architecture for providing security and privacy to support video operators which works in surveillance system. The advantage of proposed system is that it uses automated calibrated cameras it display detected events and object extraction in 3D view. The work of Tahir et al. [27] have used general purpose graphical processing unit implemented with CUDA model for video analytics for intrusion detection. Similarly, the work of Conte et al. [28] have presented audio analytics framework for anomaly detection in audio data and Mueller et al. [29] have presented video analytic framework for anomaly detection in unstructured video streams. In [30] Kim et al. have designed analytics model for tracking and detecting multi-object from the video streams. The presented model is implemented with FPGA platform which achieves cost effective performances for real-time monitoring in remote centers.
Guler et al. [31] presented an analytics model in which they have used graphical processing unit for video data analytics in real-time surveillance system. In Candamo et al. [32] have carried a survey task on human pose recognition techniques and they found that there is a more need of improvement in recognition and analytics algorithm. Parsola et al. [33] have presented a new method for executing post event study on stored surveillance video data. The main aim of this algorithm is gathers video data into the HDFS which capably recognizes the area of data from HDFS basis on the time of amount of event and accomplish further execution. Fakhar et al. [34] have illustrated a cheaper substitute of a transportable community video surveillance technique which executes on Raspberry Pi 3 using Open-CV. The next section discusses about the research problems that has evolved after reading the existing approaches followed by brief discussion of proposed solution to address such problems.
The significant research problems are as follows: a. Existing studies towards video analytics are found to less emphasize on real-time constraints surfaced naturally on video surveillance system. b. Low illumination condition as well as variable illumination condition has not been found to be addressed in existing research work in video analytics. c. Dynamic and random mobility of a subject always poses a problem of either partial occlusion or complete occlusion that are found not to be addressed in existing system. d. Utilization of simple and computationally efficient approach is not found much to be emphasized in any of the existing research on video analytics. Therefore, the problem statement of the proposed study can be stated as "Developing a simple video analytical framework that is capable of extracting precise information of the dynamic subject under challenging environment". The next section outlines the solution of this problem.
The prime purpose of the proposed system is to develop a simple and yet novel framework of video surveillance system that is capable of performing identification of dynamic subjects in random motion. The core idea of proposed implementation is to offer a faster response time with respect to movement of the subject and hence targets for simple computational approach. The study is carried out using analytical research methodology and its scheme is pictorially presented in Figure.1 Faster response time in offering the identification of the dynamic mobility of the subject is only possible if the implementation adopts certain scheme where potential information about the dynamic frames of the video is considered. In order to facilitate such form of implementation, the proposed system introduces an algorithm for reading the input video feed unlike any conventional system. It takes the input of both colored video feed as well as a depth video feed (that is priory extracted from the colored video feed) in order to offer an extracted video feed as an output. This output is now fed to the second block of implementation i.e. algorithm for obtaining depth and frequency information from the blob image. Convertion of the colored to binarized frames significant assists in offering potential information of any specific patterns of the dynamic subject. Usage of motion blob principle assists in addressing the problems of partial or complete occlusion of the subjects during the random movement as the binarization has offered better differentiation capability of foreground and static or moving background subject. Further, the final algorithm for dynamic subject identification takes the input of identified motion blob and depth image that after processing leads to generation of depth information, frequency information, as well as segmented depth. All these information jointly assists in exploring all the potential points representing a specific target very much discretely and is found highly suitable even in low illumination condition. The proposed system also offers highly reduces non-iterative algorithm processing steps that results in faster processing time in synchronous with the higher precision of identification. The next section discusses about the approaches used for algorithm implementation followed by outcome analysis.

ALGOROTHM IMPLEMENTATION
The proposed algorithm is meant for designing a simple and effective scheme of dynamic subject identification for assisting in video analytics. The design is carried out in such a way that higher precision is retained during the complete process of counting the number of subjects moving dynamically in low illumination area. The algorithm is specifically design to assists in offering accurate information of every video analytical process. This section discusses about different set of sequential algorithms that ensure the perfect operation of extracting information of number of subjects for a given dynamic video feed.

Algorithm for reading the input video feed
This algorithm is responsible for taking the input of the video in order to further facilitate in video processing. Unlike the conventional video processing steps, the proposed system considers depth information of the related input video for enhancing the precision of subject identification in much better way. A closer look into the auto-focus capability of any digital camera will explain the effective utilization of depth information for a given scene. Basically, depth information of an image offers information of the z-information of the targeted subject corresponding to the real world that significantly increases accuracy of counting the subjects especially in case of partial or full occlusion. The distance of a subject can be represented by the image intensity from a specific viewpoint. The proposed algorithm makes use of this concept in order to boost up the accuracy in identification. The algorithmic steps are as shown as below.

. End End
The above algorithm shows that it takes the input of fo (frame of video) and fd (frame of depth video) that after processing yields and output of d1/d2 (extracted video feeds for study). Apart from this input, the algorithm performs complete analysis on the basis of the frame length specified to it in the form of start frame sf and end frame ef (Line-1). Only for the selected frames (Line-2), the algorithm constructs two matrix d1 and d2 that is used for re-positing frame information corresponding to original frame fo and depth frame fd. Both these matrix will be subjected for further processing hence forward.

Algorithm for identifying motion blob
The usage of Binary Large Object or BLOB has already been proven to offer better form of discrete information about the structure of a given image. The proposed system utilizes the blob in order to analyze all the set of considered frames (from sf to se). It also assists in isolating the objects from the given binary image that assists in increasing accuracy of identification of target subject for a given scene. The process of extraction of blob from the given video feed is shown in algorithmic steps below: The output of the prior algorithm is basically considered as an input for this algorithm i.e. d1 and d2 (Line-1). The algorithm extracts the digitized information of both original frame (d1) as well as depth frame (d2) in order to obtain two matrix i.e. color of an image Ic and depth of an image Id (Line-2). The proposed system than applies a two-dimensional median filter ϕ (Line-3) over the depth of an image Id. This operation leads to the further processed information of the depth image i.e. Id where each pixels retains the median value in the form of 3x3 matrix around the corresponding pixel in the input image. The proposed algorithm also uses a minimum and maximum depth value Dmin and Dmax to ensure that the considered image depth matrix should retain only the pixel information with the scope of [Dmin Dmax] (Line-4). The new elements of the matrix satisfying the condition in Line-4 is retained in a temporary matrix (Line-5) that is further subjected to binary morphological operation that initiates with erosion and ends at dilation operation. The function θ1 represents such morphological function. The obtained morphed binary image binImg (Line-7) is now subjected to fine-tuning process where the smaller elements in the matrix are eliminated. This step offers the finally identified motion blob of the given video feed (Line-9) in the form of a binary image binImg as an output.

Algorithm for obtaining depth and frequency information from the blob image
After the motion blob information is obtained, the algorithm is now ready to perform discretization of the dominant pixel group. However, there is a slight challenge in doing so as the motion blob information obtained do not contain any form of indexing or labeling of the dominant block of pixels. Therefore, this algorithm assists in extracting precise information of depth using labeling concept so that segmentation can be carried out in precise manner. The algorithmic steps are as shown below:

Algorithm for obtaining depth and frequency information from the blob image Input: binImg (identified motion blob)
Output: b=double (Id(O)) 6.

D→Id(binImg) End
The algorithm takes the input of binImg (identified motion blob) obtained from previous algorithm processing output that after processing yields x (depth information), y (frequency information), and D (segmented depth). The first step of this algorithm is to construct a matrix L in such a way that it should correspond to the size of the binary motion blob binImg. The matrix L consists of index or labels for all the components that are connected altogether in the binary motion blob of video feed (Line-1). An explicit function θ3 is applied with an input argument of binary motion blob binImg (Line-1) in order to yield a matrix L with number of elements as c. For all the elements c (Line-2), the algorithm constructs a function (Line-3) that ensures extraction of only those objects L which is dimensionally same as 1 to c (Line-4). The revised version of a depth image Id is now formed where the elements of the matrix is now changed to obtained values of object O followed by fine-tuning with double precision (Line-5). The next part of the algorithm implements a function h(x) to apply image histogram considering two input arguments i.e. obtained value of b from prior set i.e. Line-4 and minimum and maximum scope of b (Line-6). The implementation finally result in formation of depth information x and frequency information y in order to further redefine the motion blob image of the subjects for a given video feed. The further step after it is targeted for segmentation operation. For segmentation, the algorithm substitutes the prior elements of the depth image Id to new elements of the binary motion blob binImg (Line-8). Further fine-tuning with precision (unsigned integer of 8 bits) will make the segmented depth more obvious for display.

Algorithm for dynamic subject identification
This is the final algorithm that is responsible for performing the identification of the subjects in random and dynamic motion. The main idea of this algorithm is to perform a precise identification of the subjects in presence of low illumination point in order to ensure that such form of dynamic information can be captured in low light areas. Basically, the context of low light in the proposed system will mean an area where there is no sunlight and numbers of sources of lightening are very less with lesser intensity. Hence, majority of the existing application will tend to fail even in identifying the subjects and the task could be more challenging if the subjects are moving in arbitrary manner. Hence, this algorithm offers some simple and easier alternative to carry out this task. The steps of the algorithm are as follows:

End End
The algorithm constructs a random matrix col (Line-1) based on some random numbers. For a specific set of random numbers, the algorithm constructs different forms of colored matrix with row range [1 n] and considering all the columns to generate array of m number of unique binary elements (Line-3). The labeling function θ3 is applied on binary motion blob binImg in order to obtain L and c (Line-5). For all the values of count c of new matrix L (Line-6), the algorithm extracts the object information (Line-8) followed by finally applying standard deviation over double precision of binImage (Line-9). If the standard deviation s is found to be less than some specific limit p than proposed system applies region properties on object O using bounding box bbox and store the resultant elements in matrix P (Line-11). A concatenation is performed between a unit value and selected region P in order to generate a matrix bb (Line-11). Finally a matrix zs is constructed that takes three different elements i.e. mean value μ of xts, ytx, and Id, where xts and yts are extracted information of object O (Line-12). However, if the value of s is found more than the specific value of p than it applies an unique function ψ considering input arguments of object O. If the value of cp is found to be non-zero (Line-15) than O1 and O2 information's are obtained. All the values of Id that is more than obtained cp is treated as O1 or else its O2 (Line-16).The algorithm applies region properties rp to both O1 and O2 in order to obtain P1 and P2 and thereby the bounding box is created (Line-17). The algorithm finally constructs and updates the matrix zs by applying logical operation on object as an element of Id . For all the values of maximum value of index of an object bwop (Line-20), the algorithm constructs a mask only for the elements of bwop matching with i (Line-20).
Therefore, the implication of this algorithm results in a precise selection of the dynamic subjects with higher degree of accuracy. The proposed algorithm is highly progressive in its operation and hence there are lesser chances of stale information synchronization with the dynamic mobility of the subject. The next section briefs of the obtained results.

RESULTS ANALYSIS
This section briefs about the outcomes obtained from the proposed system. The analysis has been carried out experimentally. For this purpose, three subjects were chosen and the observation has been carried out for random movement of the subjects. For this experiment, a low-light condition has been set up in a room in such a way that part of the room is low light and part of the room is having medium lightning condition. Digital camera has been used for capturing the random video feeds in real-time with the size of each frames comes under the range of minimum 5000-9000 MB. The dimension of the frame is retained uniformly same as 640x480 with a frame-rates of 30 frames per second, data bit rate of 7981 kilobytes per second and total bit rate of 7981 kilobytes per second. The scripting of the proposed study has been carried out using MATLAB.
The above analysis is carried out considering the presence of single, double, and triples subjects with dynamic and unscripted movements. The proposed study took the colored video clip as input as well as it also extracts the corresponding depth frames that are essentially meant for a precise identification of all the dynamically moving subjects for the considered video feed. The complete feed was captured within a delay less than 0.0086551 seconds, which is quite in acceptable limits and suits well with any form of video surveillance monitoring system. Apart from this, the amounts of the information extracted from all the above images are also quite significant as well as precise that is quite necessary in order to perform any operations of video analytics. Table 1 also shows that proposed system offers a clear cut identification capability for any number of subjects even when the subjects are seen in partial or complete occlusion. Usage of depth image has proven significant benefits in offering potential information in the forms of frequency.  Figure 2 highlights the frequency information of the image with respect to increasing depth value. The graphical outcome shows that a discrete set of information can be obtained using proposed system.

CONCLUSION
This paper discusses about a simple framework that emphasize on the obtaining maximum set of information of the objects within a video feed. The proposed study is also capable of identifying the dynamic subjects in poor illumination condition and with random mobility pattern. The contribution of the proposed study are as follow: i) the proposed study introduces a very simple approach of video surveillance system without involving any form of complex or iterative system, ii) the proposed study harness the potential information obtained by integrating depth image for better segmentation process in highly simpler process, iii) the response rate of the detection has extremely lesser delay of 0.0086551 seconds compared to real-time feed highlighting its applicability on real-world applications and system, and iv) the run-time of the proposed system is 0.11621 seconds assessed on core-i3 processor to show that it offers significantly faster computational processing time.