Detecting and Shadows in the HSV Color Space using Dynamic Thresholds

Received Oct 1, 2017 Revised Dec 20, 2017 Accepted Jan 15, 2018 The detection of moving objects in a video sequence is an essential step in almost all the systems of vision by computer. However, because of the dynamic change in natural scenes, the detection of movement becomes a more difficult task. In this work, we propose a new method for the detection moving objects that is robust to shadows, noise and illumination changes. For this purpose, the detection phase of the proposed method is an adaptation of the MOG approach where the foreground is extracted by considering the HSV color space. To allow the method not to take shadows into consideration during the detection process, we developed a new shade removal technique based on a dynamic thresholding of detected pixels of the foreground. The calculation model of the threshold is established by two statistical analysis tools that take into account the degree of the shadow in the scene and the robustness to noise. Experiments undertaken on a set of video sequences showed that the method put forward provides better results compared to existing methods that are limited to using static thresholds. Keyword:


INTRODUCTION
The detection of moving objects in a video sequence is an important task in computer vision. It, actually, plays a very important role in a video surveillance system because the resulting detection influences all the later steps. Note that the step of detecting can be very complex due to the presence of disruptive elements in the environment of the object. In fact, factors such as weather conditions, changes in lighting of the scene, the presence of shadows or moving objects in the scene (movement of branches of a tree, window movement, a computer screen, etc ...) may negatively influence the detection process.
To make up for these problems, several approaches have been proposed in the literature. These include the background of modeling methods that can be classified into several categories: basic, statistical, vague, etc. [1]. A thorough analysis of these methods has demonstrated that these statistical methods are generally more robust to illumination changes and dynamic background [2]- [4].
The principle of detecting moving objects in a video sequence is based on a classification of the pixels of the image in the foreground (mobile) and background (static). Given that the background of a video sequence often contains not static objects such as the branches of trees, we have chosen to use the Gaussian mixture method (MOG) for the background modeling as it is the best adapted to such situations [1], [5]. Note that the original MOG method [2] uses components of the RGB color space that are very sensitive to changes in lighting and are not independent. This is why in some previous works, we preferred to use other color spaces that are more robust to changes in lighting while having independent components. The areas of most used colors in this work are the normalized RGB [6], YUV [7], [8], HSV [8] and LUV [9].
Among the important issues encountered when detecting moving objects is the impact of the shadow whose presence in an image can change the perceived shape and color of the objects. Unfortunately, the points of objects and those associated shadows share two important visual characteristics: the movement and the shape [10]. Therefore, whatever is the update of the background, the points of movement corresponding to objects and shadows are detected simultaneously while being grouped. This greatly alters the shape of the detected silhouettes. This problem negatively affects several tasks that fall within the detection, namely the tracking and the classification algorithms as well as the evaluation of the position of the moving objects. To make up for these limitations, we start from the assumption that the shaded areas can be detected by choosing a color space possessing a better separation between the chroma and intensity than that of the RGB space [11]- [14]. The choice of the HSV colors space is motivated by its capacity to separate the intensity component (V) of the chromatic components (H and S) [15]. However, when the value of saturation S of a pixel is below a certain threshold, this pixel is considered as "achromatic" and its components saturation S and hue H are not anymore taken into account. Consequently, our analysis is based only on the component intensity V to decide if this pixel belongs to a new element or to a background. The HSV color space separates good chroma and intensity [16].
In what follows, to detect the shadow in a video sequence, we used the available information on the color considering the HSV space. A shaded background in principle should have an identical color with lower brightness. Note that there are methods that have been proposed to detect and remove the shadow by considering the HSV space. However, these methods use only static thresholds to separate the shadows of the foreground [12], [16], [17]. But the more we gradually we advance in time, the more difficult the change in lighting makes to remove shadows correctly when thresholds are static. This is why we propose to rather use a dynamic threshold. Indeed, as the shadows are strongly related to lighting, it is possible to remove the shadow more correctly by changing the values of the threshold dynamically by taking into account the degrees of the shade and the noise. This paper is organized as follows: Section 2 is devoted to the presentation of previous work on the detection of moving objects and on the detection and removal of shadows. In Section 3, we present the method put forward in this work to detect and remove while recalling the principle of the statistical method used for background modeling the image. The results obtained by our method as well as a comparison of these results with those of the existing methods will be detailed in Section 4. Finally, in conclusion, a discussion about the performances of the proposed method will be given in the last section. This document is a template. An electronic copy can be downloaded from the conference website. For questions on paper guidelines, please contact the conference publications committee as indicated on the conference website. Information about final paper submission is available from the conference website.

METHODS FOR DETECTING MOVING OBJECTS AND THEIR SHADOWS
Given the importance of detecting moving objects in a video sequence, several approaches have been developed to provide robust methods to complex conditions (non-rigid objects, backgrounds dynamic, etc) [18], [19]. Given the multitude of methods proposed in the literature for the detection of moving objects and according to the works presented in [20]- [24], we can distinguish two major categories of detection methods: those with and without background modeling [1].
Detection techniques with no background modeling generally consist of performing a spatiotemporal difference operation of pixel intensity values which constitute the frames of the video sequence [25]. In their simplest form, these techniques are confined to using the pixels of two consecutive frames of the video. Although these techniques manage to extract most of the relevant pixels of moving regions, they are generally sensitive to the dynamic changes such as lighting or background image change [26]. Thus, more sophisticated methods which use statistical characteristics of each of the pixels were developed to make up for the defects of the detection methods without modeling the background.
The first detection methods with background modeling were oriented modeling by a single Gaussian per pixel [27]. To account for the multi-modal aspect, Stauffer and Grimson [27] are among the authors to propose a Gaussian mixture model. Kim et al. [4] propose a "codebook" dictionary method which can manage a moving and noisy background but requires a learning step that can be long. As a general rule, the background modeling techniques are robust to noise and can get adapted to the presence of new objects in the background. However, they remain vulnerable to many environmentally induced phenomena such as shadows [10].
The detection of shadows has been the subject of several works. The proposed methods were used in different areas such as the recognition of moving objects, video surveillance and tracking of road traffic. Andres et al. [28] undertook a comparative study of the recent methods that have dealt with shadow detection. They classified these methods into a taxonomy based on characteristics that include four categories: chromaticity, physics, geometry and textures. Other researchers concerned with the same issue also include Wenbo et al. [14] who proposed a model to predict the threshold used to detect the shadow, Cucchiara et al. [12] who put forward a technique using the information obtained in the HSV color space to eliminate the shadows where the observed values of the three components of the HSV space are compared to those of the model of the background. Moreover, Cucchiara et al. [11] described an extension of their method presented in [12] where a higher level of reasoning component classes regions as follows: background, shadow, phantom or ghost shadow, moving objects.
In [10], the authors developed a shadow detection technique applicable in the context of subtracting the background. Their technique considers two measures: one characterizing the distortion of the brightness and the other color distortion. While detecting changes, both measurements are compared to thresholds to determine if it is Shadow or significant changes. This technique aims especially at reducing the calculation time. Nadimi and Bhanu [29] introduced a method that keeps the pixels corresponding to the shadows while eliminating others. Horprasert et al. [30] proposed a method based on RGB color space to classify the pixels into four categories: original background, illuminated and shadowed, and pixels movement. To this end, two steps are added to the method based only on the RGB color space: the chromatic distortion and the distortion of brightness.
Prajakta et al. [31] proposed two techniques to detect the shadows outside images having a constant background while considering variable lighting conditions. In the first technique, they used a global thresholding considering the H and V values of the HSV color space to calculate the map report. Then, the threshold value to produce the binary image from the global image is determined using the method of Otsu [32]. The map of the gradient is found using the Sobel operator and the V component of the HSV space. Therefore, the shadow is obtained from the map gradient established using the threshold of the global image. In the second technique, the image in the HSV color space is converted into RGB color image so as to build the map report. After, the threshold value to produce the binary image from the image is determined using the method of Otsu. The regions have been labeled in order eliminate all small areas and obtain the local threshold using the method proposed by Otsu for shade. Finally, Priya and Kirtika [33] proposed a new algorithm to detect and remove the shadow shape still images from the outside. The proposed algorithm uses chromaticity to detect and remove the shadow.

EXTRACTING OBJECTS FROM THE FOREGROUNG
When extracting objects from the foreground [34], the method we are proposing in the present work aims to achieve two objectives. The first is to achieve a good detection of moving objects in the scene while the second is ensure that this detection makes no confusion between the detected objects with their shadows that are to be removed after. In addition, to be operational in any environment, the proposed method has to be robust to changes in brightness in the scene. In what follows, we present the method used for extraction of the foreground.

Model of the background
Several color models such as RGB, HSV, YUV and L*a*b spaces have been used for the statistical modeling of the background of a video sequence, in our case, we propose to use the HSV color space whose advantage lies in its invariance regarding luminosity.

Modeling of the background in the HSV space colors
To track and analyze the characteristics of an object (its movement, speed, trajectory, etc. ...), it is necessary to detect it. The basic detection technique models the background from multiple images that are acquired sequentially.
The color characteristics refer primarily to components of color spaces which can be treated separately or jointly. Although the RGB space is the most widespread [35], some authors use other HSV color spaces [12], [36] while others [16] used L*a*b spaces. These spaces are more robust than the RGB space [11] because they can increase invariance with respect to changes of luminosity and lighting and also in relation to the presence of shadows. Note that a study has recently shown that [16] that the HSV color space is better than the L*a*b.
To extract the pixels in the foreground, we propose to use, for each pixel of the back-ground, an adaptation of the modeling by a mixture of Gaussian (MOG) proposed by Stauffer et al. [2] which consists in changing the color space. Indeed, we propose to work in the HSV color space while being based basing itself solely on the intensity V to decide if the pixel belongs to a new element or to a background. Thus, the observations corresponding to a pixel which vary over time are considered as an X t process defined as: a. The X t process is initialized by recent pixel values: Since we are interested only in the intensity component (V), we will only keep the coordinated V i of x i , that is to say b. Each pixel of the reference image is modeled by a mixture of k probability densities. The probability of a pixel to belong to the background is given by: , ,    k : represents the number of Gaussians used in the mixture. In our case, this number varies from 0 to 3 instead of 3 to 5. ω i,t : is a weight assigned to each Gaussian representing the proportion of the data used in the calculation of the Gaussian at time t. η: is a multidimensional Gaussian function defined by an average vector µ t and a covariance matrix with I is the identity matrix. Once the proposed model based on the MOG method is built, its update over time is performed using the expectance maximization algorithm (EM) through the proposed equations in [10]. Note that in order to remove isolated pixels so as to extract the moving objects, we made use of mathematical morphology techniques.

Extraction of the foreground
In our algorithm, the decision on the membership of a pixel of a given image to the background or element of the foreground is effected by calculating a Mahalanobis distance type [37]. This distance is calculated between the recent value of the pixel p and all the Gaussians for that pixel. We have a correspondence between the pixel value and the Gaussian calculated on the V component of the HSV color space if: So we get the image of the foreground by applying a threshold on the weight of the corresponding Gaussian to determine if it matches the background or foreground.

Detection and removal of the shadow
Using the Gaussian technique mixture of Gaussian technique can detect any moving objects in a scene but does not make a distinction between the detected objects and their shadows. The purpose of this sub-section is to improve our system for detecting moving objects so as to enable to not classify the foreground elements as shadows.

Removing shadows
In many cases, detecting the shadow of an object is not always easy to achieve because the class of the points belonging to the object and the points corresponding to the shadow can have a similar visual appearance, especially when working with gray levels. The detection of the shadow is based solely on a syntactic discrimination between the appearance of shadows and objects in terms of brightness and color. In our case, we need to develop a process for detecting the shadow in order to remove it using the results obtained in the extraction phase of the background, as seen in the previous section. Thus, to distinguish moving objects from their shadows, we used the HSV color space where our algorithm is based on the following equation: represents the foreground of the component V (brightness). Consequently, a point (x; y) is classified as a shadow when it satisfies the following property: the inverse of the foreground of the component V respects an upper limit to a dynamic threshold B A whose value will be determined later.
This equation is derived from the fact that the presence of the shadow in an area often results in a significant change in brightness with no great modification of the color information. This is why we require that the inverse of the background should be compared to a threshold B A (with 0< B A <1). The first factor A considers the degree of the power of the shadow (the lower value of A is the more darkened are the shadows of the covered objects), while the second factor B is used to increase the robustness to noise (the brightness of the current image can not be too close to that of the background).

Dynamic and automatic thresholding
The calculation of the dynamic threshold for classifying points as shadows or non-shadows requires determining the factors A and B. For this purpose, the procedure given below should be followed: a. Once the absolute difference between the current image and the background in the HSV color space is measured, calculate the median MED and the median absolute deviation MAD in the mask of the foreground F m detected in the previous section. So, the formulas used are:

Improving the detection quality
The elimination of shadows is not enough to allow for the exploitation of an image. Indeed, there are, in the majority of cases, isolated pixels that do not belong to the detected object. In addition, the effects of "holes" are easily noticeable on the detected objects. To make up for this limitation, and to subsequently provide a uniform result containing no noise, improving the quality of detection is generally required.
To this end, we propose to have recourse to mathematical morphology operators, which are powerful tools. In our case, the objective is to rid the image of any isolated pixel. For this reason, an erosion operation is suitable. Note that this step can optionally be performed before the identification of the shadows. The following figure clearly shows the two operations that consist in detecting the object and removing its shadow.

EXPERIMENTS
This section presents the results of the tests we undertook using the proposed method on a set of video sequences extracted from the PETS-ECCV'2004 videos -CAVIAR, PETS-2006 base and on a scene from the campus of the University of San Diego. The objective of these tests is to evaluate the performance of our method to detect moving objects while removing their shadows.

Results for detecting moving objects
The comparative tests carried out in this section aim to show the benefit of using the HSV color space in the detection phase. For this purpose, our detection method was compared with that developed in [38], [39], which is an adaptation of the MOG method. Note that in these tests, we were limited in our detection method to 3 Gaussians while using of the HSV color space or more precisely the V component of this space. For the method of MOG, we considered the case of 3 and 5 Gaussians using RGB colors space. For these three methods, we applied morphological operators to get rid of isolated pixels. The results obtained are shown in Figure 2 below. Thus, the results in Figure 2 show that the MOG method produces detection errors that might be either holes on the detected object or a poor detection of object in the scene caused by changes in brightness. The results also show that, our method is capable of detecting objects accurately by using only 3 Gaussians. It is noted that to improve the detection quality, the MOG method requires increasing the number of Gaussians used to 5, which generates an increase in terms of computation time.

Results concerning the suppression of the shadow
To test the performance of the proposed method with regard to the removal of the shadow, we applied it on various scenes and under different conditions. Figure 3 shows an example of the results obtained by considering the case of two very close objects as well the case including a change of brightness. The results show that our method based on dynamic thresholding using the HSV color space provides better results both when detecting objects and removing shadows. This confirms that its robustness versus changes in brightness.

Comparison of the proposed method with other existing methods
To evaluate the performance of our method, we compared it with two other existing methods of detecting objects by removing shadows where this comparison was made by taking into account the following parameters: the threshold type considered for shadow detection and response time. In this comparative study, we used three methods for detecting and removing shadows. The first method is the one used implemented in the OpenCV library and uses the techniques proposed in [38]- [40]. This method is used while considering the HSV space. The second method is the one proposed by Cucchiara et al. [12]. The third is the method proposed in the present work.
An evaluation of the above mentioned methods in terms of detection and removal of shadows was performed on video sequences in the HSV color space. The thresholds used to remove the shadow are static for the first and second methods. For example, for the first method the parameter  is set to 0.5 while for the Cucchiara method, we used the same parameter values of  and  given in [4], that is to say 0.4 for  and 0.6 for  . On the other hand the method proposed in this paper uses a dynamic thresholding calculated using the approach presented in Section 3. The results are shown in Figure 4 below. Analysis of the results in Figure 4 shows that the proposed method is more efficient for the detection and removal of the shadows compared to the other two methods. Indeed, the method implemented in OpenCV and that of Cucchiara produce errors for the elimination of shadows (these errors are surrounded by a green rectangle in Figure 4), which is the case for our method since it is able to properly remove shadows in the scene.

CONCLUSION
In this paper, we proposed a new method to extract objects of interest from video sequences while removing their shadows. For the detection phase, our method is based on modeling background so as to enable the classification of pixels between the background and foreground. For this purpose, we proposed to apply an adaptation of the MOG method where the foreground is extracted using information from the intensity component (V) of HSV color space while updating the model of the background to take into account its potential variations.
The presence of the shadow in the image results in perturbations in relation to the extraction model of objects. To detect shadows that are to be removed the elimination phase of shadows for the proposed method is based on the HSV color space using a dynamic threshold. The implementation of this method on video sequences has shown its proper functioning as it allows to properly extracts moving objects while removing their shadows.