Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster

ABSTRACT


INTRODUCTION
Intelligent video surveillance system (VSS) has evolved as an active study area in computer vision because of its numerous real time applications for social security. It intends to detect, identify and track the object in various video frames or image sequence. The motive behind is to establish an intelligent visual surveillance system and reinstate the traditional surveillance system due to deployment of multiple cameras for continuous monitoring. When an event occurs then the capability to perform scalable and timely analytics to this extensive accumulated data is a high preference for every intelligent VSS. Therefore the major challenges faced by video surveillance system are; a) Storage of continuously increasing gigantic data, generated by the multiple surveillance cameras. b) Prompt processing of progressively rising data when an event is triggered Processing and storing consistently growing data with conventional network storage and database system is not an easy task. Hadoop, which was originally designed by Google, has evolved into dominant processing model for such applications which are data exhaustive [1]. It's extensible, tolerant to error and splits and copy data, sends the computation where data resides. Hadoop has received so much recognition because of its easy accessibility. The structure of Hadoop is very rigid so it is not trivial to develop and deploy the complex algorithms to the MapReduce model. Although a lot of research have been performed for video processing [2]- [6] with Hadoop, yet it has not been utilized for post event investigation. Thus the motivation of our work is given as follows:  Handle multiple streams from various surveillance cameras.  Storage and timely analysis of extensively accumulated massive data to identify an event of interest.  Speed up in performance We have used Hadoop for storage and processing single stream surveillance data on a single node cluster as discussed in [7]. In this paper, we propose a framework for post event investigation as shown in Figure 1, which stores the multi-stream data accumulated from multiple cameras deployed for video surveillance application, into the HDFS. If an event occurs then, user sends query to analyze the required data along with the time duration when the event is suspected to occur. Based on the time duration, the system identifies the location of the data residing in the DataNode in the cluster and computation is executed by Hadoop MapReduce. Our proposed approach for post event investigation of such a massive data, overcomes the need for analysis of entire data generated by the set of video cameras deployed for monitoring purpose thus reducing the computation time.
The remaining paper is arranged in the following way, section II discusses the relevant study performed by various researchers. Section III discusses the architecture of post event investigation with Hadoop and analyses the algorithm proposed for multi-stream video data processing and storage with Hadoop section IV which shows results and performance analysis and section V discusses the conclusion.

RELATED WORK
It has been shown in [1] that Hadoop Mapreduce is appropriate for processing text data which require same computation to be performed in the entire massive data residing in the HDFS. Therefore initially MapReduce was used for the problems like searching, sorting large data, large scale indexing, graph computation [8] matrix computation [9]. Some researchers have tried to use it for image processing and large scale query processing and query search as well [10], [11]. In [12] parallel execution of scattered database is performed. A colored image is converted into grayscale image and parallely features are drawn out. High resolution images are processed and features are removed with Hadoop MapReduce [13]. Hadoop MapReduce framework is also utilized for application like image retrieval based on the content [14]. An image refinement method with MapReduce is discussed in [15]. It needs images to be streamed only once compared to other file system which needs each time entire image or part of image streamed after applying filter. [16] Proposed a distributed image processing system named SEIP, which is built on Hadoop, and employs extensible in node architecture to support various kinds of image processing algorithms on distributed platforms with GPU accelerators. [17] Have used hadoop for clustering categorical data.
Few research investigators have used video data processing [18] (video transcoding) with Hadoop MapReduce framework as discussed by [2]. [3] Performs parallel video analysis and processing whereas video playing, sharing and storage [4] with Hadoop cluster. Hadoop has been used for large video management [5] and for Object detection and classification [6]. The work implemented in [19]

5091
Hence based on the above literature survey it is evident that Hadoop MapReduce has not been utilized for post event investigation applications and moreover the problem of multiple streams storage and processing of surveillance data is still a challenge.

ARCHITECTURE FOR HADOOP AND ANALYSIS OF MULTI STREAM VIDEO DATA USING HADOOP
We have designed our event investigation system based on a MapReduce framework for data storage and data processing. Hadoop is an open source software framework licensed by an apache, it is used for distributed processing [20], [21], [26], [27] and distributed depository of extensive dataset across group of nodes build on low priced computers. Traits intrinsic to Hadoop are data partitioning and parallel computation of large datasets. Its storage and computational capabilities scale with the addition of hosts to a Hadoop cluster, and can reach volume of sizes in the petabytes on clusters with thousands of hosts. It comprises of two principal elements as discussed in [7]. First is Hadoop Distributed File System (HDFS) used for distributed file system, second is MapReduce which is the execution engine or data processing framework as shown in Figure 2.
The analysis of the multi-stream video data using Hadoop is done in three different phases  Storing the multi-stream video data to the HDFS  Processing multi-stream video data with mapreduce  Accumulating all the results and displaying the result.
In VSS, data accumulated from various cameras deployed for monitoring purpose is massive and continuously keep on increasing. Question is to store such an extremely large data. Moreover the issue becomes more complex when an event is triggered and the data is to be processed to extract the useful information regarding any event. Traditional method used for extracting useful information is to check the entire database, which is computationally expensive. There should be some measure where the user can search the particular data based on the time of occurrence of event, instead of searching the entire database.
Therefore for this purpose Hadoop HDFS is used. Data in HDFS is processed in batches. Therefore streams are buffered into local memory and then data is transferred into the HDFS. Moreover Hadoop was originally designed for text processing thus, there is no support in Hadoop for processing video data. We extract frames from video stream and store them as Sequence file in the HDFS. Sequence files are Hadoop particular archive file layouts very much like to tar and zip. It brings together the file set with a key and a value combination where key is the file name and value is the content of file. The generated sequence file is mostly half the size of the original data and hence takes limited memory area in HDFS converting it storage efficient. These files can be separated and processed in parallel. For video analytic applications like motion detection, rather than comparing every alternate frame it is sufficient to process every alternate fifth frame [22].

Data Storage
Rather than storing all the video frames we store every alternate fifth frame which further reduces the storage space in HDFS. We use a novel technique for storing the sequence files (frames) using Algorithm 1. Every video camera is identified with a unique identifier like V1, V2,….,Vn while storing the sequence file we generate the name of sequence file by concatenating camera identifier, date, time and frame number, eg. V1_1_07_2016_10_12_22_1 where V1 is the name of the camera or stream, 1_07_2016 is the date (Day_Month_Year format), 10_12_22 is the time (Hour_Minutes_Seconds format) and 1 is the frame number. This approach facilitates appropriate identification of DataNode where the frame has been stored. Thus overcoming the time required to search entire data accumulated so far. This data in the HDFS is separated into blocks (default size of block is 64 Mb) and further stored in various nodes of the cluster. Each block is replicated with 3 copies in the machines of the cluster.

Data Processing
The Hadoop MapReduce work flow as shown in Figure 3, user enters the query with the time and sends it to the master. In the master Jobtracker divides the job to various tasks and sends it to the Tasktracker residing in the slave nodes. The tasks are executed by the map and reduce function respectively. The process of DataNode identification and data processing algorithm is further discussed in Algorithm 2 Results is accumulated by the TaskTracker and final output is generated. To prove the efficiency of our proposed work we are finding motion in the video data on the basis of the time of occurrence of event as provided by the user. Moving object detection based on motion segmentation is itself a challenge in VSS.
A lot of research have been performed for motion segmentation and has been broadly classified into Background subtraction and temporal differencing. In background subtraction motion [23] is detected by finding the difference between the present frame and the reference background whereas in case of temporal difference [24] motion is determined by calculating the pixel wise difference between the present frame and the earlier frame. The motion detection algorithm as proposed in [25], is used in our system for finding moving objects. One of the efficient methods of frame differencing is block matching, for identifying moving objects. In block matching as shown in Figure 7 the current frame is divided into blocks and similarly previous frame is also divided into blocks and the blocks of are searched into so if a block of is found in different pixel location in it implies that motion is produced. Step 1: User sends the query in the form of date and time to the master.
Step 2: Master sends the computation to the JobTrackerand NameNode identifies data location Step 3: JobTracker splits the job into the TaskTracker Step 4: The computation is executed further by map and reduces in the data residing in the data node and result is sent back to the TaskTracker.
Step 5: JobTracker accumulates the result from TaskTracker and forwards it to the master Many techniques have been proposed by various scholars for performing matching computation. Sum of Absolute Difference (SAD) is used in [25] for measuring the difference between the two video frame block, as it is highly efficient. The lower value of SAD means the higher similarity between the two blocks. It is calculated using eq.1.
where a , b indicates position of pixel in earlier (reference) and present frame. Adding to this a threshold (th) is enforced to SADo to decrease processing time. It helps to determine whether to initialize the search or not on the basis of ex.(4).

Search Decision = (4)
For each block mild th can be fixed as part of 256 X 15=3840 where, 256 is 16 X 16 block pixel value. Part value lies within the range of( 0.4 ,0 .1).
The resultant is a set of motion vector. Accumulating all the results for every frame of a video final motion is plotted. In order to find out the motion detection we use SAD to detect the motion in the video frames as aforementioned. Map function reads two frames as an input and splits each frame into 32 by 32 pixel size blocks and each block is assigned a key and value containing the 32 by 32 block and this output is called as intermediate data.
Each key value pair is passed to the reduce function in such a manner that the values containing the same key is passed to the same reducer. The task of motion detection is performed by the reduce function and it finds the moving object based on motion segmentation.

Result Accumulation
Finally all the results computed by various reducers for all the blocks of the video frame are accumulated a final output is obtained displaying the moving object on the corresponding video frames. Algorithm 3 shows the data accumulation process. This approach can be used to detect the event in multiple streams where the possible time of occurrence of the event is provided by the user. It can be observed from the above explanation that our proposed approach can achieve the following  Efficient Storage of multi-stream video data accumulated from numerous cameras deployed for surveillance into the HDFS.  Extract data based on the time of occurrence of event provided by the user.  Analytics of the massive data with MapReduce in short time.

RESULTS & PERFORMANCE EVALUATION
We have analyzed performance of our proposed framework in the following manner;  By measuring the computation time for varying 1 If SADo<th 0 otherwise The proposed work is implemented on Intel core i5 3.10 GHz with 4 GB of memory on Ubuntu 12.04 as an operating system Hadoop version is 1.2.1. We have used 5 and 10 nodes cluster for performance evaluation. We have evaluated performance by varying file size and data size of a cluster and its affect on the computation time. Detection of motion is driven on video frames (grayscale) with pixel size 256 X 256, 512 X 512 and 1024 X 1024. Colored video frame are converted to grayscale before processing. Calculated motion detection on various video sequences is displayed on figure 4 to 6. JavaCV which is wrapper for OpenCV library [31] is used to plot motion vector. Figure 4 to 6 shows the original frame and the corresponding segmented frames where motion is identified. Experiment is computed on standard dataset accessible openly Change Detection Benchmark [28], Laboratory for Image & Media Understanding (LIMU) [29], Context Aware Our proposed framework efficiently reduces the storage space in HDFS and the results of data size reduction are shown in Table 1. First column of the table represents the original data size, second column is the data size which is reduced when only alternate fifth frame is stored the data reduction produced is about 80 -85 % and third column displays the data size reduction achieved by the compression due to the sequence file generation and the compression is about 80%. The result clearly shows the efficiency of our approach in terms of storage. The extensibility and robustness of the framework is evaluated by analysing the multi stream video data on various nodes of the cluster. Experiment is executed with different number of nodes to be able to understand speed up. Parallel speed up S p is measured as given by e.q. (5) where T 1 is the total execution time calculated in one node cluster and T n is the total execution time calculated in n node cluster were n > 1. value of S p shows the number of times parallel execution is faster than running the same MapReduce algorithm on the single node cluster. If it is greater than 1, it entails that there is at least some gain from doing the work in parallel. Execution time for video frames of pixel resolution 256 x 256, 512 X 512 and 1024 X 1024 in sequential (a simple java program) and MapReduce cluster of various nodes and computed speed up is shown in Figure 7. The processing time is the total time to calculate motion detection in the required data size and we have searched 100 MB data in the HDFS as well as in sequential and further performed motion detection in the respective data.

Analysing Task Execution Time by Varying Number of Reducers (Workers) Performing the Job
We have also analyzed the performance of motion detection algorithm by varying the number of reducers (workers). Figure 8 shows the outcome of different number of reducer for various volumes of data and various pixel size video frames. We also tested execution time by varying map tasks but results were not remarkable. Execution time for smaller data volume is almost similar but for larger data volume reduction in processing time is achieved considerably. The table clearly shows that it is not necessary that  For low resolution video frame 250-300 reducers on an average provides good results.  For high resolution video frame 500-700 reducers on an average provides good results. Thus this gives prior information to set the number of reducers for computation as finding the number of reducers providing efficient result is a tedious task.

CONCLUSION
We have proposed and implemented an efficient approach for performing post event investigation on massive volume of surveillance data which is one of the challenges of Video Surveillance system. We have used Hadoop HDFS for distributed storage and Hadoop MapReduce for parallel and distributed processing of massive accumulated multi-stream video data. We have proposed an algorithm for efficient storing video data in the HDFS. Hence when an event is triggered we automatically extract data based on the time of occurrence of event and process it further to find useful information. To prove the competence of our proposed approach in handling and processing extremely huge data, we have implemented motion detection algorithm in Hadoop cluster.
Hadoop cluster consists of maximum of 10 nodes. Our experiment result precisely indicates that the computing period is shortened, when pixel resolution of video frame is increased. We also analyzed the performance by measuring the computation time for varying number of reducers (workers). Network latency also affects the execution time in a cluster. To solve this issue execution time can be further improved. Moreover through the increment in number of nodes of a cluster, computation time can be cut down more. Our framework is robust and can cope with varying number of nodes in the cluster as well as increasing data volume. Hadoop performs excellent for application which need similar task to be performed in distinct data sizes; hence application requiring different jobs to be performed in various data sets in aligned manner is not possible with Hadoop MapReduce.