New approach to calculating the fundamental matrix

Received May 11, 2019 Revised Nov 15, 2019 Accepted Nov 26, 2019 The estimation of the fundamental matrix (F) is to determine the epipolar geometry and to establish a geometrical relation between two images of the same scene or elaborate video frames. In the literature, we find many techniques that have been proposed for robust estimations such as RANSAC (random sample consensus), least squares median (LMeds), and M estimators as exhaustive. This article presents a comparison between the different detectors that are (Harris, FAST, SIFT, and SURF) in terms of detected points number, the number of correct matches and the computation speed of the ‘F’. Our method based first on the extraction of descriptors by the algorithm (SURF) was used in comparison to the other one because of its robustness, then set the threshold of uniqueness to obtain the best points and also normalize these points and rank it according to the weighting function of the different regions at the end of the estimation of the matrix ''F'' by the technique of the M-estimator at eight points, to calculate the average error and the speed of the calculation ''F''. The results of the experimental simulation were applied to the real images with different changes of viewpoints, for example (rotation, lighting and moving object), give a good agreement in terms of the counting speed of the fundamental matrix and the acceptable average error. The results of the simulation it shows this technique of use in real-time applications


INTRODUCTION
The epipolar the geometry of a scene describes the connection between two or more images of the same scene from different views by producing the projective geometry between the views. The calculation of the fundamental matrix (F) which describes the epipolar geometry is used by camera calibration [1], auto calibration [2], projectile reconstruction [3], reconstruction 3D [4], motion analysis [5], object mapping and tracking [6], target location 3D and personnel tracking personnel 3D [7,8]. The calculation of the fundamental matrix requires at least seven or more matching points, these points determine by two methods that are, manual selection of points this technique is not acceptable because of the larger error and others by time of the higher processing, it is not practical to treat it from a sequence of images or video images. The second approach is based on four detectors that are more robust by different transformations of the scene which are: such as the Harris angle detector [9], (FAST) [10], the robust scaled invariant characteristic's detector (SIFT) [11] and the robust accelerated characteristics detector (SURF) [12] were used to detect remarkable points. These remarkable points are then automatically matched in the different changes of the scene pose, by applying point matching algorithms. The two detectors SIFT and SURF, which works well compared to (Harris and FAST) because of the affix transformation. Then takes a descriptor vector for each characteristic point, behave best in this step to find the correct matches by different variation. In addition, it has been shown that SURF is much faster and more stable than SIFT in terms of calculating the fundamental matrix and finding the correct matches, but in terms of points of interest, the SIFT the detector detects the number of points of interest greater than SURF [12]. After the detection of the correspondence points by different detectors, the calculation of the fundamental matrix is done according to two methods. These methods are (linear methods) and (non-linear methods) [13,14]. The first method and is sensitive to the determination of the correspondence due to the additional noise [15].
The last method is (robust method) which are more tolerant to noisy it is divided into three techniques which are: Least Median-Squares (LMedS) [16], RANdom SAmple Consensus (RANSAC) [17] and M-Estimator [18]. These methods are used to classify the matches. The first method, calculates for each value of 'F' the number of points that may be suitable (inliers), the matrix 'F' chosen is that which maximizes this number, Once the aberrant points are eliminated, the matrix 'F' is recalculated to obtain a better estimate. Disadvantage of this method does not include outliers. The second method is LMeds calculates for each estimation of 'F' the Euclidean distance between the points and the epipolar lines, and the choice of 'F' corresponds to the minimization of this distance. Same as the first method does not count outliers.
Then the third technique that includes our method, based on the M-estimator method who inspired by the two preceding methods, it consists in dividing the detected points into four sets: inliers, quasi-inliers, outliers and other. The main contribution of this article is to quickly calculate 'F' by significant correspondence when using outliers. The problem that deals in our article with how to evaluate the descriptors extracted by different regions (inliers, quasi-inliers, outliers and other) according to the optimization function to calculate the fundamental matrix in real-time with considerable error acceptable with different point variations (rotation, illumination, and displacement etc.).

EPIPOLAR GEOMETRY AND CALCULS FUNDAMENTAL MATRIX
All methods of estimating of the fundamental matrix require a number of point matches as an input element. Outstanding characteristic image points such as corners and edges are usually employed for this purpose. Among the feature detectors, the Harris Corner detector is the most widely known. It is based on the computations of the eigenvalues of the second moment matrix and is scale invariant. We have a lot of detectors and descriptors have been proposed, among which, SIFT, PCA-SIFT [19], gradient location and orientation histogram (FAST), and ASIFT [20].
Among these techniques, those which, in addition to the point detection position, generate descriptor vectors are chosen because the characteristic points do not include enough information for an exact match. Mikolajczyk and Schmid [21] reviewed the (SIFT, PCA-SIFT, FAST) and various others feature detection techniques and noted that (SIFT) works more than others. Other descriptors in rotation, scale, and point of view changes, Bauer and al. [22] showed that although SURF has fewer key points and a slightly lower functionality quality than SIFT, it works faster than SIFT with disparate views, rotation and scale. So for the quick adaptation required for real-time applications, the authors chose to exploit SURF. Calibrate the uniqueness threshold to obtain more precise matches with minimum number of points. According to the previous studies for accuracy of point matching and this article evaluates the effect of changing point of view, rotation, illumination and moving objects on the accuracy of the matrix resulting fundamental.

Epipolar geometry
Epipolar geometry is intrinsic to any two camera system regardless of the model used for these cameras. It was introduced by Longuet-Higgins [23]. This geometry allows establishing a geometric relationship between two stereo images. The accuracy of the estimation of the epipolar geometry is very remarkable since it conditions the accuracy of the pairing algorithms between the points of a pair of images, these algorithms often relying on prior knowledge of this geometry. The camera center of the right camera

Fundamental matrix
The fundamental matrix gives the transformation by drawing a selected point in one of the images as an epipolar line on the other image, thus projecting a point on a line. Mathematically, the epipolar constraint probably translated by the fundamental matrix as indicated in the following equation: where 'F' is a matrix of dimension 3x3 and of rank-2, and determined from 'F' and zero, the (1)

METHODS TO CALCULATS THE FUNDAMENTAL MATRIX
These techniques can be organized in linear methods in linear and iterative or robust methods [17]. Linear methods, introduced by Longuet-Higgins [23], are very sensitive to noise due to mismatching the iterative methods used by the Levenberg-Marquardt [24] optimization technique, can, for their part, generate a bad location of the points in the image, and the robust methods, M-Estimators [18], are able to give an accurate result with noisy images and managed the outliers by the weighting function.

Linear method
This collection of linear equations allowed to establish the epipolar geometry in a given pair of images. The main utility of this technique is its simplicity only seven points are needed for the estimation of ' F'. However, this becomes a disadvantage when some points are badly located [25]. The (1) x y x y x y nn x w x w x w n n y x y x y x nn A y y y y y y nn y w y w y w n n y x y x y x nn w y w y w y nn In practice, there are larger than 7 corresponding points. If we ignore the constraint on the rank of the matrix 'F', which is equal to 2, we can use the least squares method to solve the following equation: By imposing a constraint making the norm of F equal to 1, the problem becomes a classic minimization problem in the following form [26]: The resolution is then carried out employing the technique of multipliers of Grange.
The solution for 'F' is the eigenvector which matches to the small eigenvalue λ. The estimation of the fundamental matrix can be done more simply by study the eight-point algorithm but the solution obtained is not necessarily optimal. But, the fundamental matrix has two interesting characteristics: its rank is 2 and it is 3×3. By using these characteristics related to the detector quality, it probably improves the methods of estimating 'f'. There is a posteriori solution to find a matrix of null determinant from near 'F'. The proximity of the two matrices is estimated by the so-called Frobenius norm [27]. For to obtain ̂ the matrix F is decomposed into the following form by a technique of SVD type (Singular Value Decomposition): is a diagonal matrix with 1 2 3     and U and V orthogonal matrices one can then demonstrate that the matrix. ^. .
This algorithm has been perfected by Hartley [28] to make it even more robust. Thus, he proposed an algorithm: eight normalize points [29]. It has shown that the application of the eight-point algorithm is often unstable. The solution proposed is to replace the origin in each of the images by the centroid of the paired points. Then, a scaling factor is applied so that the mean norm of the vectors associated with the points is equal to 2 . These two operations amount to multiplying the points of the left (right) image by a matrix (3×3). These two operations amount to multiplying the points of the left (right) image by a matrix (3×3). This approach has greatly improved the outcome of the eight-point method.

Nonlinear method ( robust methods)
The LMeds [16] method calculates for each estimation of 'F', the Euclidean distance between the points and the epipolar lines, and the choice of 'F' corresponds to the minimization of this distance. get coherent epipolar geometry, should minimize both creatures the distance between points and the epipolar line. This technique gives very good results compared to those obtained with linear methods and iterative methods that minimize the distance separating the points and the Epipolar lines although iterative methods are more specific than linear methods they cannot get cleared of outliers. Among the robust methods that are RANSAC and M-Estimators they are three widely robust techniques in the research. The first method, for its part, calculates for each value of 'F' the number of points that may be suitable (inliers). The matrix 'F' chosen is that which maximizes this number. Once the aberrant points are eliminated, the matrix 'F' is recalculated to obtain a better estimation. Although M-Estimators inspired by the two preceding methods, it consists in dividing the detected points into two sets: inliers and quasi-inliers [30,31].
The latter technique is based on solving the following expression. 31 32 33 '  12  21  31  21  22  23  13 23 We have added a modification to the weighting function called the separation factor by different sets of points detected on the images as shown above.
Researches have confirmed that the technique LMeds gives a better result than RANSAC method in terms of accuracy, LMeds and RANSAC are considered similar; they consist to select randomly the set of points used for the approximation of the fundamental matrix. The difference exist between this two methods in the way to determinate the chosen 'F'. LMeds calculate the 'F' from the distance between the points and the epipolar lines where it seeks to minimize the median. RANSAC calculate the matrix 'F' from the number of inliers. However, M-Estimator leads to a good result in the existence of a Gaussian noise at the selected points of the image, the robustness of this method is manifested in the reduction of aberrant values.

ALGORITHM PROPOSED
However, M-Estimator leads to a good result in the presence of Gaussian noise at selected points in the image, the robustness of this method is manifested in the reduction of outliers. First, two images of the same scene are loaded by different variations, then the following algorithms are applied (Harris, FAST, SIFT and SURF) and after the comparison between the one found to be the most robust SURF by different variations. Then we take the descriptor of the latter and normalize for all and then choose the eight random points to find the dimension matrix (8x9) and then decompose by the SVD method to find the 3x3 property matrix followed by equal rank 2, determine and zero. Finally we add the optimization function to find the optimal solution (F) under iterative algorithm. The basic steps of the proposed technique are detailed in the algorithm as shown in Figure 2.

SIMULATIONS RESULTS AND DISCUSSIONS
In this section, we study four detectors to extract points of interest and descriptors we use the following techniques (Harris, FAST, SIFT and SURF) according to the following variation (lighting, rotation and views of moving objects). In the first test, we will match the four detectors with the RANSAC statistical technique that determines the correct correspondence, this technique applies to several real images for a variation of a moving object. The Figure 3 illustrates this variation. We can see from the results obtained in several tests and from the figure above. The two detectors (Harris and FAST) are sensitive to this variation which gives the highest error but the time to calculate acceptable, on the other hand, the (SIFT and SURF) obtain good results in terms of errors but the time to calculate an average with this variation. In the second test, we will apply the previous technique under several real images with the variation of rotation. The Figure 4 below illustrates this variation. We can see from the results obtained in several tests and from the Figure 4. The two detectors (Harris and FAST) are sensitive to this variation which gives the highest error but the time to calculate acceptably. Then, the (SIFT and SURF) obtain good results in terms of errors but the time to calculate an average with this variation. In the third test, we will apply the previous technique under several real images with the variation the change of lighting. The Figure 5 illustrates this variation.   Figure 6.
The results obtained from several tests of successful results in terms of calculating 'F' with acceptable error. The Table 1 summarizes the results of the simulation in terms of the number of points detected and correspondence. The number of points detected and the similarity depend for example on the position of the images (rotation, change of brightness, moving object). The Figure 7 below shows the optimization of the fundamental matrix by different transformations of the scene (rotation, change of brightness, moving object) according to four detectors.  Kypt1  a  309  270  529  432  82  b  273  200  300  135  18  c  264  120  200  111  43  Kypt2  a  328  260  480  439  63  b  271  180  241  126  21  c  325  70  260  88  35  Matches  a  112  54  200  150  8  b  50  42  270  73  8  c  32  16  90 19 8 Figure 7. The time to calculate the fundamental matrix by different variations Figure 8. Average error of the projection Our approach which gives good results in terms of speed for the calculation of 'F' compared to other methods not exceeding 0.8 s on average, as shown in the figure above, the results of this approach can be used in real-time stereo image analysis applications. The Figure 8 shows the estimation of the projection error by different transformations of the scene (rotation, change of brightness, moving object) as a function of four detectors. This approach given the acceptable average error does not exceed 1.4 pixels for the moving object, the rotation 1.5 pixels and 1.3 pixels for the lighting. On the other hand, the detectors (SIFT and SURF), given a good accuracy of projection error, does not exceed 1.5 pixel whatever the change on average, by the detectors (Harris and FAST) are sensitive to the different variations.

CONCLUSION
In this article, we have proposed a new approach to calculate F. Our method based first on the extraction of descriptors by the algorithm (SURF) was used compared to the other because of its robustness by different variations in the pose of images. then normalized these points and modified the uniqueness threshold to obtain the best points, after ranking them by the weighting function to estimate the "F" matrix using the eight-point M-estimator technique, for the purpose of calculating the mean error and the calculation speed "F", and then we compare our method to the other method which is based on the combination of the following detectors (SIFT, FAST, and Harris) by the RANSAC algorithm standardized at eight points. The results of the experimental simulation were applied to the real images with different changes in viewpoints, for example (rotation, lighting and moving object), giving a good agreement in terms of computational speed of the fundamental matrix which does not exceed 800 ms and the acceptable average error does not exceed 1.5 pixel whatever the change. So this approach capable of analyzing moving scenes, for example 3D reconstruction, path conflict analysis.