Foreground algorithms for detection and extraction of an object in multimedia

ABSTRACT

Background Subtraction of a foreground object in multimedia is one of the major preprocessing steps involved in many vision-based applications. The main logic for detecting moving objects from the video is difference of the current frame and a reference frame which is called "background image" and this method is known as frame differencing method. Background Subtraction is widely used for real-time motion gesture recognition to be used in gesture enabled items like vehicles or automated gadgets. It is also used in content-based video coding, traffic monitoring, object tracking, digital forensics and human-computer interaction. Now-a-days due to advent in technology it is noticed that most of the conferences, meetings and interviews are done on video calls. It's quite obvious that a conference room like atmosphere is not always readily available at any point of time. To eradicate this issue, an efficient algorithm for foreground extraction in a multimedia on video calls is very much needed. This paper is not to just build Background Subtraction application for Mobile Platform but to optimize the existing OpenCV algorithm to work on limited resources on mobile platform without reducing the performance. In this paper, comparison of various foreground detection, extraction and feature detection algorithms are done on mobile platform using OpenCV. The set of experiments were conducted to appraise the efficiency of each algorithm over the other. The overall performances of these algorithms were compared on the basis of execution time, resolution and resources required.

Keywords:
Background

INTRODUCTION
Background subtraction is the means of detaching the foreground objects from the background in a progression of video frames. In the most recent two decades there have been a numerous, improvements in methods of doing background subtraction. This algorithm has broad usage in various crucial applications like visual monitoring, sports video judgement, action seizure, and so on [1][2][3][4][5][6]. In this paper a comparative study of all the foreground detection and extraction algorithms is presented. For detection Haar-cascade, contours, watershed and HOG method are compared. For extraction vabcut and grab-cut algorithms are compared. The main motive of this comparison is to ensure which algorithm is a better one and will provide a comparatively less time for execution than others since the main idea of this project is to execute the extraction process on a mobile platform and for that the execution time in desktop should be minimal. In this paper, foreground segmentation in a video is done using OpenCV. It was found that for foreground segmentation in OpenCV there is an inbuilt method of background subtraction. All the basic inbuilt methods  [7] were implemented to enable a basic understanding of how the code works and to check whether making any minor changes in it affects the output variance. The aim was to find an efficient algorithm for foreground extraction in a multimedia.
The main objective of this work is not to just build Background Subtraction application for Mobile Platform but to optimize the existing OpenCV algorithm to work on limited resources on mobile platform without reducing the performance. The main objectives of the paper were as follows:  To detect the foreground object and get the outline of the foreground object.  To extract the foreground object effectively and varyresolutions of the video accordingly.  Enable background replacement and Reduce the execution time.  Blending the new background with the extracted foreground object.
In paper [8] CNN method is used for obtaining the foreground mask. They used the dataset of CDnet 2014 and are using Subsense algorithm to prevent over-fitting. In paper [9] selective background Subtraction is used. Background modelling is used to deduct two consecutive frames to get the static pixels. The proposed technique helps in withholding unwanted background from the forefront and the backdrop scene instead of subtracting the whole background. Paper [10] focuses on reducing the holes which is obtained while getting a foreground mask. This paper explains and implements why three frame differencing are used instead of two frame differencing. Paper [1] focuses on MOG method for detection of object. They focused on five criteria of MOG particularly segmented background weight threshold, standard deviation scaling factor, user defined learning rate, total number of Gaussian components and maximum number of components in the background model. Paper [11][12][13][14][15][16] is on video segmentation for extracting the foreground object in a video. In this paper they are using SURF algorithm for feature detection of the foreground object and eventually segmenting them effectively. Paper [17] talks about Grab Cut Algorithm which is one of the familiar approaches for foreground extraction in the domain of Image processing. In paper [1] FCFN mechanism is used for background subtraction. FCFN means fuzzy nearness degree which uses fuzzy c-means clustering algorithm for identifying whether the pixel selected is background or foreground.

RESEARCH METHOD 2.1. Frame work model of background subtraction of an object in multimedia
Background Subtraction of an object in multimedia (BSOM) is one of the primary steps in various view-based operations. As the name suggests background subtraction is the mechanism of deducting forefront substances from the environment in a series of video blocks. The main philosophy for identifying dynamic substances from the video is to subtract between the present frame and a reference frame which is called "background image" and this scheme is known as frame differencing method. Background Subtraction is extensively used in traffic surveying (detecting vehicles), human movement identification, human-machine co-operation, object movement tracing, digital forensics etc. In below Figure 1 Firstly, the video is divided into frames. After that subtraction of the estimated background i.e the previous frame from the current frame is done. Later on, a threshold is applied to get a clear foreground mask. Based on the threshold value we will get a clear foreground mask. It can be done either manually by giving a threshold value or by using various threshold algorithms.

1851
The four primary steps in Background Subtraction are Pre-processing, background modeling, foreground detection and data validation. In the first step, a given video is broken down into frames. Then the input frame is converted into numpy array for further processing. Then from this numpy array the background model is formed. If it's a video then two consecutive frames are taken and deducted to get the static pixels. These static pixels constitute the background and thus background model is formed. Further all the pixels in a frame are compared with the background model thus formed which helps in identifying the significant forefront pixels effectively. Lastly the unwanted pixels which neither constitutes the forefront nor the background are removed to get a proper foreground mask as shown in the last step of Figure 2.

Techniques used for BSOM
In this paper, various techniques are used for BSOM for foreground object detection and extraction [18][19][20][21]. Following section of the paper discusses techniques and its implementation:

Histogram of gradients (HOG)
In Figure 3 the HOG algorithm is shown which uses SVMDetector for detecting the foreground object in an image or a video even when they are of different scales. After the human foreground is detected, a rectangle is formed around it and displayed as output.

Haar-cascade algorithm
In Figure 4 Haar cascade is algorithm is shown step-wise. Suppose face detection is taken into consideration, then the input dataset will have images of faces in all its different views. Then the haar features are identified in those images and an integral image is formed as shown in Figure 5. Then these integral images are taken and given to various classifiers using the adaptive boosting technique. Finally, the machine is trained to identify faces in a certain image or video.

Grab-cut algorithm
In Figure 6 Grab cut algorithm is shown where firstly a mask in applied to the inputted image for getting the foreground and background objects. Here if the mask value is 2 then it is considered as background and if the mask value is 1 then it is considered as foreground. Finally, this mask value is multiplied with the frame to get a foreground mask.

Contour algorithm
In Figure 7 the algorithm of contours is given where resizing of image is done and then a pyr mean shift filter is applied to it. Then the image is converted to grayscale and some threshold value is given. An inbuilt function of findContours is used to identify the contours in the image and by using the drawContours function the identified contour is drawn and displayed in the output.

Watershed algorithm
In Figure 8, watershed algorithm is given. The inputted image is converted into grayscale and then a suitable threshold value is applied to it. A suitable kernel size is taken and the morphological transformations are applied to it. The sure background and foreground are obtained and then they are subtracted to get the unknown region which is the boundary of the forefront object in the inputted image. From the above the Haar-cascade is used to detect the foreground object and the Grab-cut is used to extract the foreground thus detected.

Analysis of various algorithms for BSOM
In this section of the paper gives detailed description of comparison of various algorithms. These algorithms are implemented for the BSOM.  Comparison of Histogram of Gradients (HOG) and Haar-cascade for detection of foreground: HOG method is used to detect the full body of a human unlike Haar-cascade algorithm.
Haar-cascade algorithm has various xml files for different parts of the body like half body, face, full body, smile etc. [22].  Comparison of Grab-cut and Vab-cut: Grab Cut and Vab cut both are used in the field of image segmentation. Both the algorithm helps in extracting the foreground object. The drawback of the grabcut algorithm is that it is not optimized. Vab Cut is an extension of Grab Cut [23].  Comparison of Contours and Watershed: Contours and watershed are used for the detection of the outline of the foreground object from an image or a video. Contours uses the function "find Contours" to identify the contours in the given image or video. In watershed algorithm, morphological transforms are done on the image first and then sure foreground and background is identified [24,25].

RESULTS AND DISCUSSIONS
In this work OpenCV tool was used which is an open source library that approves use of various computer languages and is applicable on several platforms. From the above Figures 9-11 it can be concluded that Haar-cascade is better detection algorithm as the input image or video doesn't always have to be full body. From Figures 12, 13 it can be concluded that Grab-cut isn't efficient in extracting the foreground object. Figure 14 clearly shows how manual masking the input image helps in the effective extraction using Grab-cut [22]. Figures 15 and 16 show how contours are drawn in an image and video. Figures 17 and 18 shows how watershed is different from contours.  Figure 17. Watershed in an image [24] In Figures 18 and 19 a difference in the output is seen when a certain video of a span of 30 seconds is downloaded in different resolutions, to check the effectiveness of the output. Tabular values of haarcascade with grab-cut for a video downloaded at different resolutions are shown in Table 1 Table 2 with different resolution values (changed by the code) and graphical representation of Table 2 is shown in Figure 21. A video of 30 seconds with a certain resolution is taken, and a table of comparison of its execution time of each of the algorithm (used in the code) is shown with different resolution values (changed by the code) in Table 3 and Figure 22 shows the graphical represenatation of Table 3. After the survey of all these results it was found that Haar-cascade with Grab-cut gave a better result than any other combination. The results can be clearly seen in Figure 23 where a resolution of 144p is taken into consideration.      Figure 21. Graphical representation of the Table 2

CONCLUSION
In this paper, various foreground detection and extraction algorithms are compared. The main objective of our project is to identify various inbuilt methods for getting a best foreground mask. Out of these algorithms we found that the combination of Grab cut and Haar cascade is best for extracting the foreground object. As clearly seen in Figure 23 there is a live video taken from the camera itself at a resolution of 144p and grab-cut is applied to it. The only drawback of this method is that the light coming from the window (as shown is the figure) cannot be cut out using grab-cut. For further work we can focus on optimizing the grab cut algorithm for extraction of foreground object even when a light source is near it.