Scedasticity descriptor of terrestrial wireless communications channels for multipath clustering datasets

ABSTRACT


INTRODUCTION
Fifth-generation (5G) wireless systems improved cellular communications due to increased bandwidth, faster data rates, and shorter latency times.The properties of 5G are studied using channel models that reproduce the stochastic properties of multiple-input multiple-output (MIMO) antennas.Channel models like the European Cooperation in Science and Technology 2100 (COST 2100) [1], International Mobile Telecommunications-2020 (IMT-2020) [2], Quasi Deterministic Radio Channel Generator (QuaDRiGa) [3], and Wireless World Initiative New Radio II (WINNER II) [4] generate multipath components (MPCs) that are grouped into clusters when they have similar characteristics in delay, angles of departure, and angles of arrival.The multipaths and multipath clusters serve as datasets and are clustered using clustering algorithms to study the effectiveness of clustering approaches.
Previous researches investigated the effect of heteroscedasticty on datasets.The performance of feed forward neural network and multiple regression are compared in the presence of heteroscedasticity in simulated data in [5] while heteroscedasticity was accommodated in allometric models to predict the forest biomass in [6].The interval fusion with preference aggregation procedure is applied to process the heteroscedastic measured direct control (DC) voltage and resistance data in [7].Also, the effect on clustering  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 13, No. 6, December 2023: 6547-6557 6548 is examined.There are differences in robust clustering when the assumption of homogeneity fails [8], [9] and if so, an unsatisfactory accuracy can result [10].
Earlier works [11], [12] on clustering multipaths did not consider the scedasticity of the dataset.This work tests the scedasticity of datasets generated by COST 2100, IMT-2020, QuaDRiGa, and WINNER II channel models.The test is based on Johansen's multivariate analysis of variance procedure under heteroscedasticity.Hence, this study is the first to conduct a homoscedasticity test on datasets generated by 5G channel models.The paper is organized as: section 2 presents the methodology of the study, while section 3 discusses the results.Lastly, section 4 concludes the study.

METHOD
The methodology of the study is shown in Figure 1.The MATLAB implementations of the channel models [13]- [16] are used to generate the channel scenarios.The different channel scenarios depicting the propagation environments of the wireless communications system contain various multipath components and multipath clusters.The data generated are subjected to the directional cosine transform (DCT).The results of DCT become the datasets used in the Homoscedasticity Test. Figure 1 details the parts of the study.Figure 1.Methodology of the study

Channel models
This section presents the standard channel models (CM) COST 2100, IMT-2020, WINNER II, and QuaDRiGa.The global parameters line-of-sight (LOS) and non-line-of-sight (NLOS) propagation conditions are configured, extracted the MPCs, and arranged the data in an excel file.Channel models based on the clustering of the azimuth, elevation, and delay domain parameters at both link ends lead to the doubledirectional channel models [17].The use of clusters in channel models provides a good trade-off between the complexity and accuracy.

COST 2100
The COST 2100 channel model (C2CM) is a geometry-based stochastic channel model and a part of the COST family of channel models.The model can replicate the MIMO behavior in time, frequency, and space [13] through simulation and generating channel coefficients and the small-scale parameters (SSPs).The C2CM adopts the visibility region concept of its predecessor where the cluster can be seen by the mobile station (MS) and at maximum as the MS approaches the center.Hence, this visibility region (VR) concept allows the simulation of a non-stationary channel.Furthermore, clusters can be identified as single-bounce cluster, local cluster, and twin clusters.Local clusters have omnidirectional spread in the azimuth, singlebounce cluster positions are rotated with a Gaussian distributed angle, and both can be treated as special twin clusters with zero cluster-link delays.The C2CM supports the 285 MHz for semi-urban and 5.3 GHz for indoor carrier frequencies.

IMT-2020
The IMT-2020 channel model specifies the use cases for 5G which supports the 3D MIMO by extending the elevation domain.The model supports a center frequency of 0.5-100 GHz in its MATLAB implementation.The use cases of 5G have corresponding scenarios based on the model; for the indoor hall (InH) enhanced mobile broadband (eMBB), the dense urban eMBB is the urban macro (UMa) and urban micro (UMi) layers, and rural eMBB is represented by Rural Macro (RMa).Additionally, the channel model supports the massive machine type communications (mMTC) and ultra-reliable and low latency communications (URLLC) use cases of 5G, which are excluded from the data generation.The IMT-2020 is also considered a geometry-based stochastic channel model (GBSCM) for its primary module, an additional module below 6 GHz is a map-based hybrid channel module based on the ray-tracing model.For its SSPs, the delay and cluster power are modeled from exponential delay distribution and exponential power delay profile, respectively, while a wrapped Gaussian or Laplacian for the power angle spectrum in the azimuth of all its clusters.The number of clusters generated by the IMT-2020 follows the Poisson distribution.It also takes into consideration the vegetation effects on mmWave bands.

WINNER II
The WINNER II channel model has a module of 2 to 6 GHz, 100 MHz bandwidth, and 19 propagation scenarios.WINNER II is based on ray-tracing, which also replicates the double-directional feature.Furthermore, the chosen scenarios Indoor A1 LOS and NLOS, UMi LOS and NLOS indicated by B1, and UMa LOS and NLOS indicated by C1 [4].The A1 is defined as the indoor office or residential scenario, the B1 is a typical urban microcell, and the C1 is the urban microcell for wide area networks.

QuaDRiGa
The QuaDRiGa channel model is an extension of the WINNER SCM model, which supports the three dimensions (3D) MIMO modeling, continuous time evolution, and transitions of propagation scenarios, and also provides terrestrial and satellite scenarios [3].The approach used by the authors to develop the QuaDRiGa channel model is a statistical ray-tracing model which differs from other channel models and has the spatial consistency of both large-scale parameters (LSPs) and SSPs.The QuaDRiGa model supports the carrier frequency range of 0.45 to 100 GHz and is compatible with the 3 rd generation partnership project (3GPP) channel model.Its propagation scenarios are urban macrocellular UMa or urban microcellular UMi, validated from measurements in downtown Berlin, Germany.

Directional cosine transform
A time snapshot of the channel model can be characterized by an  ×  matrix, where [⋅] T is the transpose operator and the ℓ -th multipath vector, describes the azimuth angle of departure (AOD)  ℓ,AOD , elevation AOD  ℓ,AOD , azimuth angle of arrival (AOA)  ℓ,AOA , elevation AOA  ℓ,AOA , and ℓ -th multipath delay  ℓ that is illustrated in Figure 2. The angular ambiguity in the circular data,  in  RAW , can be avoided in the process of clustering by transforming each spherical coordinate in  ℓ into its equivalent rectangular coordinate using the directional cosines.
x  = sin  sin  (4) The transformation results in an  DCT with seven dimensions where, DCT serves as the dataset for the homoscedasticity test.

Datasets
The The IMT-2020, COST 2100, QuaDRiGa, and WINNER II datasets are available online in IEEE DataPort [18], [19].They consist of various channel scenarios, multipath clusters, and multipath components.However, each channel scenario has the same thirty Excel sheets of data.Figure 2. COST 2100 double-direction propagation path parameters [1] Table 1 shows the channel scenarios for each dataset.IMT-2020 has eleven channel scenarios.The number of clusters are all equal for the thirty sheets of data for each channel scenario.On the other hand, the COST 2100 dataset consists of eight channel scenarios.The number of clusters varies for the thirty sheets of data for each channel scenario.The values shown pertain to the maximum number of clusters per channel scenario.Moreover, QuaDRiGa has eight channel scenarios.The number of clusters is equal for all thirty sheets of data per channel scenario.Lastly, WINNER II has six channel scenarios.The number of clusters is also the same for the thirty sheets of data for each channel scenario.
The number of multipaths per channel scenario is shown in Table 2.The number of multipaths is the same for the thirty sheets of data for each channel scenario for IMT-2020, QuaDRiGa, and WINNER II datasets.Whereas, the COST 2100 dataset has varying number of multipaths per sheet of data for all channel scenario.The values shown refer to the maximum number of multipaths for each channel scenario.

Homoscedastic test
Channel models have been developed with specific application scenarios in place.However, despite the seeming similarity among the application scenarios of the models discussed in this paper, there is some degree of scenario variability.The authors, thus, introduce a description to measure the variation of each channel model type.The variance of the parameters (delay, angle, and the like) of the channel model type is determined using statistical tests.
Testing for the scedasticity, either heteroscedasticity or homoscedasticity, enables a quantification of the variation.Either homoscedastic or heteroscedastic tests can be done since they are complementary notions of each other.Let a trivariate data matrix be, where  is the number of samples under the 1 st group of the same dataset with three (3) features f1, g1, and h1.  ,   ,  ℎ are each an  × 1 vector.Similarly, let a second trivariate data matrix of the same dataset be defined as (8), The homoscedastic test is based on the covariance matrices of  31 and  32 .So, if the null hypothesis holds, i.e.,  31 and  32 have equal covariance matrices, then the dataset is considered to be homoscedastic.The homoscedastic test is expected to be done per  samples.The datasets generated from the channel models do not always have equal .Thus, one based on multivariate analysis of variance (MANOVA), which compares the mean vectors of several multivariate normal populations, was adapted [20].It is based on Johansen's procedure [21].

RESULTS AND DISCUSSION
Using study [22], the Johansen's test (JT) values and the corresponding p-values of COST 2100, IMT-2020, QuaDRiGa, and WINNER II are shown in Table 3. COST 2100, QuaDRiGa, and WINNER II across all channel scenarios have JT values greater than one while IMT-2020 for all channel scenarios has JT values less than one.Moreover, COST 2100, QuaDRiGa, and WINNER II have p-values less than 0.05 which show that the mean vectors of the different channel scenarios are significantly different.In contrast, IMT-2020 has p-values greater than 0.05 which indicates that the mean vectors are not significantly different.The homoscedasticity test results of the four CMs are shown in Table 4. COST 2100, QuaDRiGa, and WINNER II are heteroscedastic, while IMT-2020 is homoscedastic.The heteroscedastic CMs have lesser number of multipaths per cluster in Semi-Urban, UMa, and UMi channel scenarios resulting to a more varied distribution of multipaths across all channel scenarios.On the other hand, the homoscedastic CM has a higher number of clusters for the abovementioned channel scenarios which give a more homogeneous distribution of multipaths for all channel scenarios.The dense distribution of multipaths per cluster resulted to mean vectors that are not significantly different as shown in Table 4.
Based on studies [23]- [25], the standard deviations () of the parameters AOD, EOD, AOA, EOA, and  for all channel scenarios of the four CMs are computed as (9): where  is the parameter,  is the mean, and (Ω) is the power spectrum of the parameter.IMT-2020 has the least delay spread for all (combination of LOS and NLOS) indoor channel scenarios while COST 2100 has the largest delay spread as shown in Figure 3. Also, Figure 4 indicates that IMT-2020 has the least spread in outdoor LOS channel scenarios for the parameters AOD in Figure 4

6553
QuaDRiGa has the least value.For the COST 2100 CM, the dataset generated is heteroscedastic due to varying number of clusters and number of multipaths.For this reason, the means are significantly different and the variances are more pronounced across all channel scenarios.Figure 5 shows the AOD in Figure 5(a), EOD in Figure 5(b), AOA in Figure 5(c), and EOA in Figure 5(d) spreads for all indoor channel scenarios.It indicates that COST 2100 has the largest spread while QuaDRiGa has the least spread for AOD, EOD, and EOA.WINNER II has the least AOA spread.
Figure 6 shows that WINNER II has the least spread for the parameters AOD in Figure 6(a) and AOA in Figure 6(b) for all outdoor scenarios.All outdoor scenarios are combination of LOS, NLOS and O2I outdoor channel scenarios.It also shows that COST 2100 has the least spread in EOD in Figure 6(c) and EOA in Figure 6(d).As for the delay parameter, WINNER II has the least spread while COST 2100 has the largest spread.5 shows the relationship between BS antenna and MS antenna height difference, the difference between the maximum and minimum EOD spread, and the difference between the maximum and minimum EOA spread.COST 2100 has the largest antenna height difference in all indoor scenarios, hence, it has the greatest EOD and AOD spread.Figure 5(a) shows the AOD spread for all indoor channel scenarios while Figure 5(b) shows the EOD spread.AOD and AOA spread are almost the same across all channel scenarios for IMT-2020, QuaDRiGa, and WINNER II while COST 2100 has the greatest spread for the majority of the channel scenarios.In general, IMT-2020 has the least spread, thus, the dataset it generates is homoscedastic while the other three CMs are heteroscedastic.It is based on the corelative observation of the relationship between homoscedasticity and the order of statistical values, and the rise and fall of these values but without mathematical proofs or equations.

CONCLUSION
The study presents the homoscedasticity test based on Johansen's procedure of the COST 2100, IMT-2020, QuaDRiGa, and WINNER II 5G channel model datasets.Results show that the COST 2100, QuaDRiGa, and WINNER II datasets are heteroscedastic while the IMT-2020 dataset is homoscedastic.Future study will look into the effect of scedasticity on the accuracy of clustering the datasets.
(a), EOD in Figure 4(b), AOA in Figure 4(c), and EOA in Figure 4(d).COST 2100 has the largest AOD and EOA spreads while QuaDRiGa has the largest EOD and AOA spreads.As for the delay spread, COST 2100 has the greatest value while Int J Elec & Comp Eng ISSN: 2088-8708  Scedasticity descriptor of terrestrial wireless communications channels for … (Jojo Blanza)

Table 1 .
Number of clusters generated for each channel scenario J Elec & Comp Eng ISSN: 2088-8708  Scedasticity descriptor of terrestrial wireless communications channels for … (Jojo Blanza) 6551

Table 3 .
Johansen's test and p-value of the different channel models

Table 4 .
Homoscedasticity test results of the different channel models

Table 5 .
Relationship between BS-MS antenna height difference, difference between maximum and minimum EOD spread, and difference between maximum and minimum EOA spread