Linear regression models with autoregressive integrated moving average errors for measurements from real time kinematics-global navigation satellite system during dynamic test

The autoregressive integrated moving average (ARIMA) method has been used to model global navigation satellite systems (GNSS) measurement errors. Most ARIMA error models describe time series data of static GNSS receivers. Its application for modeling of GNSS under dynamic tests is not evident. In this paper, we aim to describe real time kinematic-GNSS (RTK-GNSS) errors during dynamic tests using linear regression with ARIMA errors to establish a proof of concept via simulation that measurement errors along a trajectory logged by the RTK-GNSS can be “filtered”, which will result in improved positioning accuracy. Three sets of trajectory data of an RTK-GNSS logged in a multipath location were collected. Preliminary analysis on the data reveals the inability of the RTK-GNSS to achieve fixed integer solution most of the time, along with the presence of correlated noise in the error residuals. The best linear regression models with ARIMA errors for each data set were identified using the Akaike information criterion (AIC). The models were implemented via simulations to predict improved coordinate points. Evaluation on model residuals using autocorrelation, partial correlation, scatter plot, quantile-quantile (QQ) plot and histogram indicated that the models fitted the data well. Mean absolute errors were improved by up to 57.35% using the developed models.


INTRODUCTION
Autonomous vehicles and robots rely on localization technologies to determine their current location and pose. This will subsequently aid the vehicle or robot path planning and navigation. Localization technologies such as fused measurements from sensors, cooperative positioning and range based methods have been developed with the aim to achieve high level of positioning accuracy [1]- [3]. However, the sensors employed in these localization systems produced measurement errors such as offset drifts in inertial sensors and single-point solution in global positioning system (GPS).
In this paper, we focus on the GPS, which often suffer from various noise sources that degrades its positioning accuracy. To enhance positioning accuracy, the development of real time kinematics (RTK) and the global navigation satellite system (GNSS) resulted in various RTK-GNSS devices that could provide ISSN: 2088-8708  Linear regression models with autoregressive integrated moving average errors for … (Kok Mun Ng) 771 centimeters positioning [4], [5]. Centimeter positioning (between 1 to 5 centimeters) in RTK-GNSS is known as fixed-integer solution, which indicates that the receiver calculated a correct solution with minimal dilution. However, performance evaluation on five different brands of low-cost RTK-GNSS, tested in rural, sub-urban and urban landscapes revealed that they could not hold fixed-integer solution for any significant time in dynamic applications [5].
The RTK-GNSS may be subjected to noisy sources such as atmospheric and ionospheric disturbances [6], multipath errors [7], [8], canopy trees [9], and other noise sources that may be hard to distinguish. These noise components are usually time-correlated and inherit unique stochastic attributes. In recent years, error variation in GNSS data caused by weather conditions provided useful instability indices for weather forecasting such as rainfall and thunder storms [10]- [14]. GNSS error modeling was conducted with various machine learning techniques such as neural network [10], [11], classification functions [10], random forest [13], [14] and support vector machines [14] to forecast weather conditions. Hence, robust machine learning and signal processing techniques with noise models that can describe these time-correlated and stochastic characteristics are needed to process GNSS data for their related applications.
Throughout the decades, the autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) have been used to model the time series of position changes at GPS static stations, which were used for monitoring of structural health and seismic activities. ARIMA and ARMA models could provide robust estimation of the time correlation in the noise compared to white noise model which may leads to biased in parameter estimates. Barba et al. [15] stated that ARIMA produces smoother curves and is more effective in forecasting GNSS time series compared to ARMA and wavelet methods. Hohensinn et al. [16] described the random noise of static GNSS time series data by mean of an ARIMA model to perform pre-filtering of GNSS noise. Subsequently, periodograms were used to detect the presence of harmonics in the data to assist detection of vibration in structures. Špoljar et al. [17] attempted to model multipath errors in GNSS time series positions from a stationary GNSS receiver in Huawei P20 Mate smartphone using autoregressive (AR) models. However, the effects caused by multipath were not modeled effectively. The noise may need to be further investigated using ARMA. In another work, Tao and Bonnifait [18] modelled the errors of a standalone L1-GPS used in autonomous vehicle. An autoregressive AR process was used to describe the errors. The Burg method was used to estimate the AR model parameters to realize a shaping filter for the GPS errors. The positioning errors are further improved by integrating front-view camera and map matching method.
On another note, ARMA and ARIMA have been used to model the overall stationary GNSS time series to assist in prediction and forecasting. Zhang et al. [19] fitted an ARMA model on GNSS time series data of a static GNSS station. The absolute outliers (AO) in the ARMA model were used for the detection of cycle slips in the GNSS carrier-phase observations. The authors developed a Bayesian method to detect AOs in the ARMA model that resulted in cycle slips detection in GNSS carrier-phase observations. Kaloop et al. [20] fitted an ARMA model to the time series measurements of a geodetic monitoring GPS system which was used to assess the safety behavior of a long span bridge. The coefficients and model errors predicted from the ARMA model were used to evaluate the semi-static and dynamic movement of the bridge. The model errors and parameters were used to study the safety of the bridge during GPS measurements. Recently, ARIMA was integrated with other machine learning methods for forecasting in [21]- [23]. Salma et al. [24] combines variational mode decomposition techniques with ARMA on GNSS time series data to forecast ionospheric delay. In another work, Xin et al. [25] combines Kalman filter, ARIMA and GARCH to process measured sensor data from GNSS attached to a bridge structure. The linear recursive ARIMA model was established to predict the bridge structure deformation.
It was shown in previous works the application of ARMA and ARIMA that could describe the correlated noise of the GNSS measurements [15]- [18] and could be used as forecasting methods [19], [20], [24], [25]. Soundy et al. [26] discovered that AR and ARMA have better performance compared to Gaussian white noise model, low-order MA process and the Ornstein-Uhlenbeck model when evaluated using the Akaike information criterion (AIC). The authors stated that "Directly estimating statistical parameters of the GPS time series by explicitly taking into account the heteroscedastic space and time-correlated error structure of the observations could result in improved accuracy at minimal computational cost" [26]. This demonstrated the usefulness of the AR, ARMA and the ARIMA method in describing the stochastic behavior in GNSS measurements. Recent developments, observed the application of ARIMA to predict clock biases and ionospheric errors in GNSS [27], [28].
This paper addressed the need to model RTK-GNSS dynamic measurement errors which were contributed by various noise sources especially during dynamic applications [4], [5]. Though, existing ARMA and ARIMA methods were successful in modelling GNSS time series errors; the applications were mainly focus on static GNSS stations [15]- [20], [24], [25]. To our knowledge, research using purely the regression with ARIMA errors to describe trajectory mapped by RTK-GNSS during dynamic tests has not been conducted. Most works employed state estimation methods such as Kalman filter based on  [33]. In this work, we aim to describe noise in RTK-GNSS during dynamic test or when it is used for applications with dynamic movement and measurements such as that in autonomous vehicle/robot using linear regression model with autoregressive integrated moving average (ARIMA) errors. This work also contributes to an experimental proof of concept via simulations that noisy RTK-GNSS measurements along a trajectory can be "filtered" using our proposed method. This will lead to improved accuracy in positioning. Initial performance of the fitted model was evaluated by comparison with accuracy of positioning before model implementation. Section 2 outlines the methods employed in this work starting with data collection and preliminary analysis on the data and, further describe model development of linear regression models with ARIMA errors on RTK-GNSS time series obtained under dynamic tests. The methods to evaluate model fitting and performance are also elaborated. Section 3 presents the results and discussion. Concluding remarks which include future studies were given in section 4.

METHOD 2.1. Data collection and initial analysis
This work employs the EMLID Reach RTK-GNSS as shown in Figure 1(a) to implement dynamic data logging. One unit of the EMLID Reach was set up as the base station whereas another unit was set up as the rover, attached to the roof top of a moving vehicle as shown in Figure 1(b). The base station sends phase carrier corrections continuously to the rover via LoRa radio frequency in the range of 868-1,100 MHz and air data rate of 0.81-18.23 kb/s. By using the ReachView app, radio frequency was set at 912 MHz and satellite system such as GPS, GLONASS, BeiDou and Galileo were selected to obtain the best solution. The rover receives the correction from the base and calculates its best position. The EMLID Reach RTK-GNSS was deployed at a sub-urban location in the city of Shah Alam, Malaysia. Figure 2 shows the trajectory (red path) travelled by the rover for dynamic measurements in this location. It encircled some residential houses, trees and structures that may contribute multipath effects to the RTK-GNSS. The rover received carrier phase corrections from the base station (marked 'X') which was placed at a nearby football field. The baseline of the rover to the base station ranges from 60-200 meters. Data logging was conducted during fine weather and clear sky condition. Three sets of data were collected on August 8, 2020. It took approximately 2 minutes to complete data collection for the whole trajectory at a sampling rate of 0.5 Hz.
A geo-referenced map was developed for this location using quantum geographic information system (QGIS) software Desktop version 3.16.2. Fourteen fixed coordinate points were collected using EMLID RTK-GNSS as geo-referenced points to construct an accurate map for the area. Ground truth trajectory was derived using the geo-referenced map. The data collected was compared with ground truth trajectory to determine mean absolute error (MAE) and root mean square error (RMSE) of positioning. Autocorrelation, partial correlation, and histogram of the residuals were analyzed.  Figure 3 illustrates the methodology to perform linear regression with ARIMA errors on the trajectory geodetic points. The RTK-GNSS measurements and the ground truth which contains geodetic coordinates were initially converted to north-east-down (NED) coordinates (Ynorth--Xeast-Zdown). This conversion is necessary so that model fitting can be divided to two dimensions (i.e., the x and y axes respectively). A linear regression relationship between logged Xeast points with the Xeast ground truth was established for the three data sets collected. Similar linear regression was also established for logged Ynorth points with the Ynorth ground truth. Residuals of these regression models were analyzed in terms of autocorrelation function (ACF) and partial correlation function (PACF) plots.

Model development using linear regression model with ARIMA errors
Subsequently, linear regression models with ARIMA errors for each trajectory data set were estimated from the residuals ACF and PACF plots based on unique sets of ACF and PACF relations outlined in [34]. The linear regression model with ARIMA errors is defined in (1) where yt is the response series; xt is a series of predictor data; β is the regression coefficient; c is the regression intercept and is the ARIMA errors disturbance series at t=1, ..., N, and N the number of samples. The series comprised of autoregressive coefficients 1 , … , up to degree p and moving average coefficients ∅ 1 , … , ∅ up to degree q, where is the white noise series and = − . The (1 − ) is the degree D integration polynomial.
The logged points for x-axis (i.e., Xeast coordinates) and y-axis (i.e. Ynorth coordinates) respectively were used as predictors and ground truth data was used as response series to estimate the models. Best fitted linear regression models with ARIMA errors were identified for both x and y axes respectively using the Akaike information criterion (AIC). Based on (1), linear regression with ARIMA errors models were developed for x-axis coordinates as shown in (2) and y-axis coordinates as shown (3), where Xopt and Yopt depicts the optimized and improved coordinates.
The best fitted models were applied on the data sets to improve positioning errors in the respective axes. The residuals of the fitted model were analyzed by mean of ACF, PACF, histogram, scatter plot and quartile-quantile plot (QQ-Plot) to ascertain whether the model fit the data well. Simulation of each axis points from the fitted models resulted in improved geodetic points along the trajectory travelled. These   Table 1 shows the results of positioning errors and positioning availability on three sets of data collected on 8 August 2020 at the location depicted in Figure 2. The positioning errors were calculated in terms of MAE and root mean square error (RMSE). Positioning availability column in Table 1 presents the percentage ratio of data points collected with fixed solution (between 1 to 5 cm), float solution (above 5 to 50 cm) and single point solution (above 50 cm). Data set 1 recorded the lowest MAE at 0.2981 m whereas data set 3 has the highest MAE at 0.3905 m. It is obvious that the rover was not able to achieve fixed integer positioning all of the time. The positioning availability in Table 1 reveals that fixed integer positioning ranges from 11.48% to 22.03% out of the overall geodetic points logged along the trajectory. The rest of the geodetic points were mainly floating-point solutions (up to 86%) and small percentages of single point solutions. The results indicated multipath effects along the trajectory travelled by the rover that degrades the positioning accuracy. Analysis on the residuals between logged and actual ground truth were conducted to investigate ACF, PACF and the normality of the residuals. For brevity, only analyses on residuals of data set 1 is shown in this paper. Figure 4 shows the ACF and PACF plots of the residuals of data set 1 in Figures 4(a) and 4(b) respectively, which were correlated over time as spikes in the plots exceeded the upper and lower bound line. This implies the presence of correlated noise in the measurements. In addition, the histogram plot in Figure 4(c) did not exhibit normality in the data and was slightly right skewed with non-zero mean. It is obvious that the measurements were affected by noises contributed by multipath effects.

Results of fitted linear regression models with ARIMA errors
Linear regressions on all three data sets were implemented in MATLAB. The linear regression models Xopt=βXeast+c and Yopt=βYnorth+c was derived for both x and y axes respectively and tabulated in Table 2. The value in all coefficients of determination R 2 for all regressions indicated high degree of correspondence between predictor and response data. Though the R 2 is high, it could not determine whether the predictions are biased, or the regression model provides an adequate fit to the data. Hence, the residual plots need to be assessed. The residuals from these linear regression models were further analyzed using the ACF and PACF. As the residuals were non-stationary, first order differencing was applied, followed by the ACF and PACF analyses. For brevity, the results of ACF and PACF of first order differenced residuals from x-axis are shown in Figures 5(a)  integrated moving average (IMA) errors or integrated autoregressive errors (ARI) as both ACF and PACF plots displayed a cut off at a particular lag. The y-axis also displayed a similar characteristic as depicted by ACF and PACF plots in Figures 5(c) and 5(d) respectively. Hence, the best integrated MA or AR order needs to be estimated. Similar characteristics were also observed in data set 2 and 3. An algorithm written in MATLAB was implemented to determine the best IMA order and ARI order respectively for each data set with their respective AIC values. The results are shown in Table 3.   Table 3 shows the best ARI and IMA order for the respective axis with their respective AIC values estimated in MATLAB. Further comparison between ARI and IMA models reveal that an ARI (2,1,0) was the appropriate model for x-axis errors in data set 1 as its AIC value (i.e., -3132.47) is smaller compared to the IMA model. On the other hand, an IMA (0,1,1) was identified for y-axis errors in data set 1. Error models with the smallest AIC were identified for the other data sets (shown in bold font in Table 3). Hence, data set 2 was best modelled with IMA (0,1,2) and (0,1,1) error models for x-axis and y-axis respectively, whereas an IMA (0,1,2) were identified for both axes in data set 3. The best fitted linear regression models with their estimated regression coefficient, intercept and AR or MA coefficients for each data set are shown in Table 4.

Results of evaluation of model fit and performance of positioning errors
The fitted models in Table 4 were implemented accordingly for each data set. The residuals of these fitted models from the respective axis's regressions with ARIMA errors were evaluated using ACF, PACF, histogram, scatter plot and QQ-Plot to ascertain whether the model fit the data well. Figure 6 shows the ACF, PACF and the scatter plot of the residuals from both axes for data set 1. The ACF and PACF did not indicate any correlations in the residuals. In addition, the scatter plot i.e., the residuals versus fitted values displayed symmetrically distributed points and tendency to cluster towards the zero mean. Figure 7 shows normally distributed histograms and QQ-Plots of the residuals. The histograms clearly displayed the residuals have zero mean for both axes. In short, these results indicated that the model fitted the data well.  Table 5 shows the MAEs and RMSEs of 2-dimension (x-y coordinates) positioning errors produced by the fitted models for each data set. These MAEs and RMSEs were compared with original MAEs and RMSEs of each data set to evaluate the improvement in positioning errors brought about by the models. It was observed that the fitted models improved MAE by 38.73%, 57.35% and 45.56% for data set 1, 2 and 3 respectively.

CONCLUSION
This work successfully described measurement errors of RTK-GNSS along a trajectory point logged under multipath effects using linear regression model with ARIMA errors. The work has shown that regression with ARIMA errors can be extended to dynamic applications, compared to its conventional applications in static stations. As demonstrated via simulations, the models act as a filter for measurement noise along the trajectory resulted in improved positioning accuracy ranging from 38.73% to 57.35% compared to original measurements. As the regression model relied on historical data of logged GNSS points, this may not directly applicable for real-time implementation. Future studies may need to integrate other sensors such as inertial and odometry sensors to be coupled with RTK-GNSS measurements to estimate the models during real-time dynamic tests.