Self scale estimation of the tracking window merged with adaptive particle filter tracker

Tracking a mobile object is one of the important topics in pattern recognition, but style has some obstacles. A reliable tracking system must adjust their tracking windows in real time according to appearance changes of the tracked object. Furthermore, it has to deal with many challenges when one or multiple objects need to be tracked, for instance when the target is partially or fully occluded, background clutter, or even some target region is blurred. In this paper, we will present a novel approach for a single object tracking that combines particle filter algorithm and kernel distribution that update its tracking window according to object scale changes, whose name is multi-scale adaptive particle filter tracker. We will demonstrate that the use of particle filter combined with kernel distribution inside the resampling process will provide more accurate object localization within a research area. Furthermore, its average error for target localization was significantly lower than 21.37 pixels as the mean value. We have conducted several experiments on real video sequences and compared acquired results to other existing state of the art trackers to demonstrate the effectiveness of the multi-scale adaptive particle filter tracker. This is an open access article under the CC BY-SA license. is compared with three state-of-art algorithms especially PSO, PF, CS and DS, where the evaluation was conducted on two benchmark datasets OTB and LITIV, and the examination results show that the success rate of MSAPF tracker outperforms the APFT600 based on fixed tracking window, PSO, PF, CS, and DS and the target localization error is much lower than these algorithms.


INTRODUCTION
Tracking mobile object is a challenging and one of the important topics in the computer vision. Various applications were lead in this task, such as traffic monitoring [1], video surveillance [2], auto driving, braking assistance and human behavioral analysis [3]. Despite the existing research in the past decades, building a robust and effective tracking system continues to be a wide area of research in this digital area. Actually, what makes this research field a challenge is its problematic heterogeneity. Accordingly, every object tracking approach has to prove its performance in front of many constraints. In the literature, a large number of methods were using the same descriptors to represent objects, for instance color, shape, texture, some features are presented in a frequency space [4] in some case the features can be combined. In the literature, the tracking algorithm can be categorized into two classes, particularly deterministic approaches and probabilistic approaches. Deterministic approaches follow targets cross frames by searching iteratively for area most similar to the target window area via maximizing measure between those areas. A typical deterministic method is mean shift [5], it proposes a face tracking algorithm with an improved implementation of mean shift and identification of the face was based on Viola and Jones algorithm, they have also used corrected background weighted histogram to reduce noise on face on the process. More deterministic approaches were adopted by 2. METHOD 2.1. Particle filter Particle filter or sequential Monte Carlo method's [30] estimates the unknown state x t at time t from a group of observations o 1...N t influenced by noise, which are presented as (1) and (2): where x t is the system state and o t is the observation, v t is the observation noise, h t is the observation model, p(x t |x t−1 ) is the probability distribution of state x at t and p(o t |x t ) is the probability distribution of the observation o at t. The main aim of PF is finding a good approximation of the system state model which is ❒ ISSN: 2088-8708 done by a set of weighted samples X = {(x (i)] , w (i) )} : i = 1, . . . N . Where X is the set of samples. In order to find the best set of particles we use a proposal distribution g(.), according to the sampling principle, we calculate the differences between these distributions by assigning a proportional weight for each particle with the use of the function (3): with f (.) is the system distribution and g(.) is the proposal distribution. The weights values are reflecting the estimate of the probability distribution of the state at time t. In t + 1 the particle set is propagated by a dynamic model and weighted with a likelihood function w i . Next, in the resampling step new unweighted particle set will be generated. This process is iterated each time until it finds the best estimate. In this work, we have chosen particle filter (PF) with a model tracking strategy for many reasons. Firstly, PF work well with high dimensional problems. Secondly, they are not computationally expensive. And, the algorithm is flexible to implement. Finally, it was effectively integrable with scale variation approaches.

Object's information measurement
The human vision perception exhibits a scale effect, the farther one sees an image, the less details he observes. Furthermore, in object tracking, we have more information in large scale than in small scale. Qian et al. [29] have proposed a method being able to measure the quantity of information of object, that was used to update the tracking window size, whenever the object information changes.

Image feature points types
In Qian et al. [29] work, they consider primal sketch, which is the first step of visual processing [31] as the main key point to measure image information. Since the primal sketch it represents a reflection of the physical reality, and it gives the number of elements on the image. Based on that two versions of image feature points were defined and were used to compute the multi-scale information. Considering that f (x, y) is an image and P (x, y) is a pixel on the image, and it has eight nearest neighbors N l (P ) in eight l directions l ∈ L, L = {kπ/4, k = −4, ..., 3 : k ∈ Z}, as shown in Figure 1. The mathematical representation of these neighbors is: and directional differential is defined as (5).
where l = kπ/4, k ∈ [0, 3] : k ∈ Z. Based on (5) and (6), two versions or classes of feature points were redefined [29]. The first class feature points are the extremum points in an image, if ∀l ∈ L, L = {nπ/4 : k ∈ [0, 3]} then ∇ l f (x, y).∇ l−π f (x, y) > 0. and the second version or class feature points are the points whose neighbors have extrema along certain directions, if ∀l ∈ L, L = {nπ/4 : k ∈ [0, 3]} then ∇ l g k (x, y).∇ l−π g k (x, y) > 0. The results of this process are shown in Figure 2, in Figure 2(a) we have the input the alarm clock, Figure 2(b) shows the first class feature points and in Figure 2(c) we have the second class feature.

Object image information measure
In gray-scale image f(x, y), the information measure (IM) of the image IM is defined in the equation 7, where I 1 is the first class feature point and I 2 is the second class feature point as (7): regarding color images, we make the sum of the three channels to compute the information measure. Furthermore, IM changes are used to predict the object size in a sequence.

Scale updating process
The scale of the window tracking is updated every N frames according to the change of IM Figure 3 shows different IM size possibilities. The scale updating process begins by calculate the IM on the nT h current frame I 1 on the tracking window see Figure 3(a), then we calculate new IMs to get two window sizes by multiplying a factor of 1 + α and 1 − α, then preform the same steps on the n + N th frame to get three IMs I4, I5 and I6. Thus, based on the change of the information measure within each layer, we can judge whether the size of the object is increasing like in Figure 3(b) or decreasing as shown in Figure 3 The ratio of the information measure and the object size are related. Thus when I 5 > I 2 , the object size had increased, and the increasing ratio S is defined as (8): and when I 2 > I 5 the object's scale may be decreased and the radio S is as (9): where β is a parameter that eliminates the influence of background. After calculating the scale change factor S of the nth frame, the dimension of the tracking window is updated according to the formula (10).

Proposed method
In this work, we propose a system that based on a probabilistic algorithm. The purpose is to find a rough solution within searching space, this solution is improved with the propagation technique with a distribution. The scope of application of our proposed method, which is applied on stationary camera sequences, and the tracking process goes through the different stages.

Adaptive particle filter
As known particle filter is a meta-heuristic method, which looks for the best state estimate in dynamical system. It is a hypothetical detector that approximates the filtered posterior distribution by a set of weighted particles, it weights the particles based on a likelihood score and then propagates these particles according to a movement model which is explained in the next section. Based on fundamentals of PF we propose our adaptive particle filter (APF) tracker. APF relies on a deterministic search window, whose color content corresponds to histogram color model, and the likelihood weights is the result of normalized Euclidean distance between target color histogram and particle color histogram In fact, the particle state modeling in this application case will be the location of each state within the image. The state space is represented in the spatial domain P i = (x, y), where the size of the object is fixed. After generating a set of N particles, we initialize the states randomly for the first image I 0 . Then we calculate for each particle state P 1,...,N its red, green, and blue (RGB) normalized histogram hist P i rgb = [H P i r H P i g H P i b ], then, we calculate the Euclidean distance between each color channel. As a result, we end up with three distances as (11), (12), and (13): where T r r , T r g and T r b are the target's histogram for the R, G and B channel successively. Whereas, d P i r , d P i g and d P i b are the Euclidean distances between iTh particle state and target histogram. Thus, these three distance values represent the likelihood weights.
On filtering stage PF is based on weight values, we grouped the three RGB distances in one normalized vector see (14) and (15), that represent weight's vector for particular state.
After computing the weight of each particle state, we minimize the set of weights, in order to deduce the state P t i most similar to the target for the iT h state particle at the frame I t . Then we get, and we save its position on the frame. Based on this state position at resampling stage, we will apply our proposed distribution re-sampling (DS), which aims to generate new particles state for the next frame. The function defined by (16) and (17):

Proposed resampling distribution
On the PF resampling phase, we are generating new particles for those who are more likely to have low weights, to be used in the next generation. Our idea is to improve this resampling step, by generating new particles that could give higher weights. Consequently, we will have more chance to find the best estimate of the target position within the area of research (AOR).
Generally, if we use a normal random propagation, the propagation of the states in the space does not necessarily offer good candidates similar to the target. Therefore, we propose to use kernel distribution shape while propagating particles within AOR. There are a lot of benefits by using this approach. Firstly, we will get denser particles that will have higher weights. Secondly, we guarantee a high density of particles around the previous position of the target since the movement of the object between two successive frames is very small. Lastly, we propagate less dense particles to make sure that we have the chance to find the target when it makes a larger step motion. In this work, we used three kernels namely Gaussian, Epanechnikov and Triangular. Where values of distribution are between [0 and 1] for computational purposes and their equation definitions are as respectively as (18), (19), and (20): So, after computing the likelihood weights of all particles, then we choose among them the most similar particle candidate, where the lowest Euclidean distance we get the most similarity we have. After that we take the coordinates of the best estimate particle as input parameters into the resampling function, in order to build the next particles states for the next frame within the AOR. According to the values of the distribution, we deduce the number of fields. A field is a region in which we propagate a precise amount of particles. Figure 4 shows the general cases that could appear. Fields could have a circular like Figure 4(a) or an ellipse shape see Figure 4(b) depending on the shape of the target. Thus we have chosen elliptical shape because it suits kernel distribution, and it ensures covering the whole object with particles.
On resampling step, we apply our contribution that integrates kernel distribution, where the main steps are described as: − Build the distribution: first, we create a distribution according to the size of the AOR and the type of kernels. − Find the peak of the distribution : after, we get the kernel distribution result, we try to find its global Maxima and we save its position for further calculations. − Calculate the horizontal and vertical means: based on the global Maxima, we define a vertical and a horizontal path from the peak to the null value, where these null values are located in the edges, then we retrieve these values to calculate the mean of each side. − Based on the means, we calculate the number of fields and their width and height: in this step, we specify the perimeters of each field. Since we are working with elliptical shape, its equation needs parameters, especially the radius of the x-axis (width), where we will be using a horizontal cross-section of the kernel ❒ ISSN: 2088-8708 as a table. Likewise, the radius of the y-axis (height) will be calculated using a vertical cross-section of the kernel as well. Figure 5 shows a horizontal cross-section of the kernel, we consider it as a table of N elements. Where the global Maxima is presented in yellow, the blue represents the edge of the kernel, and the green represents the elements between the peak of the kernel and its last null value. Firstly, we start from the global Maxima to calculate the number of elements constituting the radius of the first field. On x-axis we check if the value of the first element is greater than or equal to the horizontal means, if it is true, then this field will have a width radius of one single element, otherwise we add the next element and the previous element and we check if their sum is greater than or equal to the horizontal means, if this is the case then the width of this field has two elements, else we add a third element to the two previous ones and we iterate again. We repeat this process until we find the length of the first field. Secondly, we move to the second field and we apply the same process, however, the element of the first field is not re-used, instead we start from the element following the last element of the previous field. In fact, the width of the second field is equal to the width of the first field plus the number of elements found for the second field. − Calculate the particle's portions on each field: at this step, we try to measure how many particles inside a field. First, we calculate the sum of the cross-section of the kernel, where the direction of the cross-section has a negligible impact on the calculation. Next, we calculated for each field its percentage according to (21): where P r f ield t and P r f ield t−1 are percentages of the current field and the previous field respectively, N f ield t is the number of elements of the current field, N cs is the number of element of the cross-section table. − Propagate particles on each field : after finding the portion on each field, we try to build these fields using the pair of radius found previously, then propagate their portion inside them. − Combine the results: finally, we will have several fields of particles superimposed, we try in this step to combine them in one single structure of particles distributed according to kernel norm. 2.3.3. Proposed algorithm: multi-scale adaptive particle filter algorithm Many tracking systems rely on object appearance, and it's the main point that affects the performance of each one. This work presents a new approach based on PF and estimation technique to construct a framework base on statistical multi-reference histogram. The general idea of our proposed work is to build an accurate target tracking system by using adapted PF and Euclidean distance on RGB channels, these channels histograms are considered as a feature representation, and it updates the size of the tracking window according to the information measure on color image. Our proposed method contains three main steps. After initializing the parameters and locating the target by histograms of oriented laplacian (HOL) detector [32]. The first step is predicting, we use a random uniform model to give for each particle state its coordinates inside the AOR, which we specify it according to the initial location detected. In the second step, the filtering step computes the likelihood weight for every state particle based on the minimization of these weights, we pick the greatest state as a result, which will be considered as the new location of the target on the current frame. Lastly, in the resampling step, we have developed this section in order to generate more distinctive and effective particle state using a specific kernel distribution. The general process of the proposed algorithm for multi-scale object tracking is summarized in Figure 6.

RESULTS AND DISCUSSION
In this section, we will show the performance of our noval MSAPF approach. In fact, we have conducted qualitative and quantitative experiments, and we have compared our results with three probabilistic methods. Namely, CS, PF, and PSO. Indeed, the experiments were implemented over an Intel Core i5-2450M CPU with 2.50 GHz and RAM 1600 MHz 12GB DDR3. For experiment purposes, object detection is done manually by marking a bounding box within the first image. Then, our algorithm runs iteratively to find the best object matching region in every new frame.

Dataset
We evaluate our proposed approach on several challenging video sequences from two dataset for tracking. Namely, OTB50, OTB100 [33], and LITIV [34]. The dataset are used for object tracking benchmark and they are frequently processed by researchers in computer vision tracking field. Obviously, these datasets guarantee a heterogeneous environment and background on which the tests will be conducted. For instance, OTB dataset provides a wide range of analyses and depicts a detailed view of tracker performance. Overall, the sequences used can group two or more difficulties such as illumination variation, scale variation, occlusion, deformation, motion blur , fast motion, out-of-view, in-plane rotation, out-of-plane rotation, background clutters, and low resolution. The ground truth position has been used in target initialization in the first frame for each compared algorithm.

Evaluation
The goal of the following evaluation is to ensure an impartial comparison, where our approach and the popular state of the art methods have undergone several tests. These tests are presented in detail at this section, and their results are mentioned in Tables 1, 2, Figure 7, and at Figure 8. One of the most used quantitative performance evaluation is center location error (CLE) [35], this criteria allow to calculate the localization errors between the center of the tracking results and the ground truth annotations.
Where N gr is the total number of images for each frame, (Xg i , Y g i ) is the position of the ground truth, and (Xr i , Y r i ) is the tracking result at frame I i , respectively. According to the (22), the smaller CLE we get, the better optimal algorithm is. Moreover, as defined in (22), the size of the bounding boxes is neglected by this quantitative metric. Furthermore, we also use the overlap ratio (OR) [35] to perform evaluation, it measures the success rate, and it also gives an idea on how our proposal performers over sequences in which the object changes its size. The OR is defined by (23): With R i is the tracked bounding box, G i is the ground truth bounding box, where R i ∩ G i represents the intersection, and R i ∪ G i is the union of two regions. We consider the tracking has succeeded, if the OR ⩾ 0.5. Moreover, we get the success ratio (SR) by setting an overlap score r which is defined as the minimum overlap ratio, which can decide whether an output bounding box is correct or not, it is calculated by (24): where N is the frame number, the decision of a corresponding tracking result is good or bad depending on the overlap ratio threshold r. As defined in equation 24, we calculate the percentage of success location according to the threshold. If SR achieves maximum score and OR is higher in this case the tracker performs great. Otherwise, it performs the worst [36].

Result analysis
In the following section, we discuss the results of the metrics that we cited above. The first evaluation was made on 17 sequences, we have made multiple tests with three distribution kernels (Epanechnikov, Gaussian and Triangular) to see their influence on the accuracy, with different particles set sizes between 500 and 800, see Table 3. The reason that we have chosen that interval, is the unsatisfying result when the number of particles is less than 500 and when we increase the number it does not give a convincing accuracy. As a result, the overall performance evaluation demonstrates that the APF method, which implements Triangular kernel with 600 particles, outperforms all the compared APF with different parameters [37]. The next evaluation was conducted on on multiple tracking algorithms. Namely, PF, PSO, CS, APF with triangular kernel and 600 particles (APFT600), MSAPF use also with triangular kernel and 600 particles and Deep sort (DS). Where Table 1 demonstrates the average CLE and Table 2 shows the average SR. The experiment results are highlighted within the dashed box, in which our proposed approach MSAPF gets good results. It gives 18.59 pixels in average CLE and it has the highest average SR of 80.01%, followed by APFT600 in the second place with performance of 56.28% in average SR and an average CLE of 20.06 pixels. Then comes PSO in the third place. While PF and CS algorithm takes the next place. The last rank is taken by the DS algorithm.

❒
ISSN: 2088-8708 To compare the tracking results in details between our proposed approach and the existing trackers methods. We highlight the overlap ratio plots in Figure 7 where we have calculated OR plots namely for wbook sequence in Figure 7(a), blurcar3 sequence in Figure 7(b) and blurface sequence in Figure 7(c). As well as the success ratio plots Figure 8, where we have calculated SR plots for same sequences, wbook in Figure 8(a), wbook in Figure 8(b) and wbook in Figure 8(c). This is shown over 3 sequences. Apparently, from the Figure 7, we can see that our tracker maintains a higher and stable overlap score along the sequences. Additionally, we use the success ratio plot to figure out the average performance of trackers on every sequence. The success plot gives the percentage of frames where the overlap ratio is higher than a threshold. As illustrated in Figure 8 our improved MSAPF approach achieves the highest score in terms of this evaluation versus the state of the art trackers. The above analysis implies that our approach performs more accurate and gives stable results than the other trackers. We can say that our tracker illustrates the expectations on the results.

CONCLUSION
The deterministic approaches in the literature have proven their efficiency in many tracking cases, however, there are scenarios where they showed their negative points especially at the time running, resource use and on accuracy rate. In this paper, a novel object tracker approach has been proposed by integrating kernel distribution inside PF probabilistic algorithm with scale updated tracking window, in order to give more intense candidate around the region of interest and the size of the tracking window changes according to the object information measure of the object. This feature has improved our previous tracker APF based in triangular kernel and the size n=600 particle significantly. Furthermore, MSAPF is compared with three state-of-art algorithms especially PSO, PF, CS and DS, where the evaluation was conducted on two benchmark datasets OTB and LITIV, and the examination results show that the success rate of MSAPF tracker outperforms the APFT600 based on fixed tracking window, PSO, PF, CS, and DS and the target localization error is much lower than these algorithms.