A computationally efﬁcient detector for MIMO systems

AIn this work, a newly designed multiple-input multiple-output (MIMO) detector for implementation on software-deﬁned-radio platforms is proposed and its performance and complexity are studied. In particular, we are interested in proposing and evaluating a MIMO detector that provides the optimal trade-off between the decoding complexity and bit error rate (BER) performance as compared to the state of the art detectors. The proposed MIMO decoding technique appears to ﬁnd the optimal compromise between competing interests encountered in the implementation of advanced MIMO detectors in practical hardware systems where it i) exhibits deterministic decoding complexity, i.e., deterministic latency, ii) enjoys a good complexity–performance trade-off, i.e., it keeps the complexity considerably lower than that of the maximum likelihood detectors with almost optimal performance, iii) allows fully parameterizable performance to complexity trade-off where the performance (or complexity) of the MIMO detector can be adaptively adjusted without the requirement of changing the implementation, iv) enjoys simple implementation and fully supports parallel processing, and v) allows simple and efﬁcient extension to soft-bit output generation for support of turbo decoding. From the simulation results, the proposed MIMO decoding technique shows a substantially improved complexity–performance trade-off as compared to the state of the art techniques. Copyright


INTRODUCTION
In the last decade, cooperative and multiple-input multiple-output (MIMO) techniques have been extensively studied as their improvements in performance do not require additional power or frequency spectrum [1][2][3][4][5][6][7][8][9][10][11][12][13]. In this work, the performance of existing linear and nonlinear decoders [2,[14][15][16][17][18][19][20] for MIMO systems is compared with the newly proposed decoder that is particularly suitable for implementation on software-defined-radio architectures. The maximum likelihood (ML) decoder is the optimal detector for MIMO systems [2,15]. In this decoder, a search over all possible combination of transmitted symbol vectors is performed. The ML detection proves to be optimal, however, at the cost of high complexity which increases exponentially with the increase of the modulation size and the number of transmit antennas [15,16]. On the other hand, linear detectors such as the zero forcing (ZF) and minimum mean squared error (MMSE) detectors are the simplest and widely used detectors with reasonably lower bit error rate (BER) performances at very low computational complexity [2,4,17,18]. Correspondingly, the vertical Bell laboratories layered space-time (V-BLAST) technique uses an iterative detector that implements the concept of successive interference cancellation (SIC) to find a good trade-off between complexity and performance [2,[18][19][20]. SIC  further improved by incorporating appropriate ordering of the symbols, i.e., first decoding the symbols that exhibit small estimation error before detecting the weaker symbols.
In this work, we are interested in implementing, developing and evaluating a MIMO detector that provides the optimal trade-off between the decoding complexity and BER performance as compared to the state of the art detectors. Therefore, we introduce a new MIMO decoding technique which i) enjoys a good complexity-performance trade-off, ii) allows fully parameterizable performance configuration, in the sense that, the performance of the MIMO detector can be adaptively adjusted without the requirement of changing the implementation, iii) enjoys simple implementation and fully supports massive parallel processing, iv) exhibits a fixed complexity, i.e., unlike the popular sphere decoder, the decoding complexity is deterministic and does not depend on the particular realizations of fading or noise environments, and v) allows natural extension to soft-bit decoding required for modern channel decoders.

2.
SYSTEM MODEL Let us consider a MIMO system with nT x transmit and nRx receive antennas as illustrated in Figure  1 and assume frequency non-selective flat fading channels. If a signal vector x is sent from the transmit antenna array where symbol x j emitted from the jth transmit antenna and y i is received by the ith antenna, then the signal at the receive antennas can be expressed as where y = [y 1 , y 2 , · · · , y nRx ] T , x = [x 1 , x 2 , · · · , x nT x ] T , n = [n 1 , n 2 , · · · , n nRx ] T , and H denotes the MIMO channel matrix which describes the input-output relation. In this representation, n denotes the noise vector which is modeled as independent, zero-mean, complex Gaussian random variables with unit variance.
where Q is an nRx × nRx unitary matrix, R is an (nRx × nT x) upper triangular matrix, and I nT x is an (nT x × nT x) identity matrix. Making use of the QR decomposition, we can transform the channel model (1) into an equivalent triangular channel, such that whereñ = Q H n. After preprocessing the received data, model (3) becomes in a triangularized form.

THE PROPOSED RANDOMIZATION BASED MMSE DECODER
In this section, let us introduce a new MIMO decoding algorithm where this algorithm carries out the following steps:

Step 1: Preprocessing (Nulling/Channel equalization)
Nulling, i.e. channel equalization, is used to remove the channel effect from the received signal vector. This process is performed using MMSE channel matrix inversion. The linear receiver W MMSE is computed to minimize the mean squared error (MSE) [2] of the received signal, given by where E {·} denotes statistical expectation. Therefore, the equalization matrix W MMSE corresponding to the MMSE decoder is expressed as where I denotes the identity matrix. According to the principle of linear receivers, W MMSE given in (5) is multiplied by y given in (1) to reconstruct the symbol vector x by removing the channel effect and suppressing noise enhancement, such thatx MMSE = W MMSE y. For convenience of representation and for our derivations in the following, the MMSE decoder can be formulated in another form. Making use of QR decomposition explained in Sec. 2.,H can be expressed as where the partitioning is such thatQ (6), the equalization matrix W MMSE corresponding to y in (1) can be expressed as whereH is given in (6). Making use of the equalization matrix (7), the soft-decoded symbol after MMSE decoding becomesx where x denotes the true transmitted symbol vector and the estimation error vector e =x − x is Gaussian distributed with zero mean and error covariance matrix given by This step is carried out only once and before the randomization is started.

3.2.
Step 2: Generating random vector In this step, the decoder generates a number of instances of a random vector e k , {k = 1, ..., N o rand} with mean and variance equal to those of the estimation error e in (9), i.e., e k ∈ N 0, σ 2 (R HR ) −1 where N o rand is the number of generated random vectors and e k denotes the kth generated random vector. From , a corresponding set for random vectors is computed according tô Note that generating more instances of a random vector e k will increase the probability to have one of them as close as possible to the optimal one. By doing this, the overall BER performance will improve.

Step 3: Hard decoding
In this step, the decoder converts for k = 1, . . . , N o rand the soft decoded randomized symbol vector x k generating according to (10) to hard decoded symbol vectorx k by finding the nearest constellation point for each soft decoded symbol as shown in Figure 2. Note that this is a symbol by symbol processing step performed using the round operation with almost no additional computational complexity.

Step 4: Selection
In this step, the decoder selects among the hard decoded symbol vectorx k for k = 1, . . . , N o rand the vector x prop. that maximizes the ML metric, such as The above described main procedure in Steps 2-3 can be efficiently implemented using either an iterative or a parallelized implementation as shown in Figure 2. The estimate of a symbol obtained by using MMSE filter has a bias or mean and variance. The randomization algorithm says that there is a high probability to get closer to the actual symbol by searching randomly for a symbol set having same mean as our estimated symbol set and within the limits of the variance circle. By doing this, a better estimate of symbols can be found. It is clear that we can improve the performance of the decoder by increasing the number of randomization instances, i.e., the value of N o rand. This, however, comes at the expense of increased decoding complexity. Therefore, the performance (or complexity) of this decoder can be adaptively adjusted by changing the number of randomization instances without the requirement of changing the structure of the implementation, i.e., the performance to complexity trade-off can be adjusted using system parameter N o rand. Furthermore, the proposed algorithm enjoys simple implementation based on the widely used MMSE technique. The proposed algorithm offers a fixed decoding complexity that does not depend on the quality of the received signal vector y.

SIMULATION RESULTS
In the simulations, let us considered MIMO systems with independent flat Rayleigh fading channels and either four transmit and four receive antennas, eight transmit and eight receive antennas, or 20 transmit and 20 receive antennas. All MIMO detectors using 4-QAM, 16-QAM, and 64-QAM constellations are compared. In the proposed decoder, the symbols can also be drawn from any M-PSK constellation. In all illustrated figures, it can be observed that the ZF decoder exhibits the worst performance, however, with linear decoding complexity. On the other hand, the curves which enjoy the optimal decoding performance in any of the figures represent the sphere decoder or ML decoder, however, ML decoder suffers from extremely high (exponential) decoding complexity. Any other decoder has a performance and decoding complexity between that of the ML and the ZF decoder. In the following figures, ZF, MMSE, SIC, SD, ML, K, and Rand = L denote the ZF decoder, MMSE decoder, SIC decoder, sphere decoder, ML decoder, K-best decoder with K nT x iterations and the proposed decoder using randomization technique with L iterations. From the figures, it is observed that i) the decoders using MMSE outperform those using ZF due to their robustness with respect to noise enhancement as compared to the ZF decoders, ii) the ML decoder enjoys optimal performance at the cost of very high decoding complexity, iii) the proposed decoder can improve the complexity-performance trade-off where it keeps the complexity considerably lower than that of the ML detectors with almost optimal performance, and iv) the sphere decoder which enjoys the optimal performance does not have a fixed complexity and in specific cases the complexity may be as large as the complexity of the ML decoder, however, some sub-optimal sphere decoders enjoys fixed decoding complexity, e.g., K-best decoder with suboptimal performance as shown in Figure 3 [20].

Ì
ISSN: 2088-8708 From Figures 3, 4, 5 and 6, both, sphere decoder and ML decoder are optimal and have exactly the same BER performance and the proposed decoder using only 10 iterations outperforms the suboptimal decoders, i.e., ZF, MMSE, and SIC with and without ordering using ZF or MMSE. We emphasize that, in the proposed randomization based decoder, the nulling step, i.e., the matrix inversion step using QRdecomposition, is carried out only once and before the randomization is started as discussed in Sec. 3., while SIC decoder performs the same step every layer [2]. In all investigated decoders, i.e., ZF decoder, MMSE decoder, ML decoder, sphere decoder, SIC decoder, K-best decoder and the proposed randomization based decoder, the QR decomposition stage described in Sec. From Figure 3, it is observed that the BER performance of the proposed decoder with N o rand = 50 iterations enjoys the same performance of K-best decoder with K 4 = 10 4 iterations. Clearly, the performance of the proposed decoder outperforms K-best decoder at the same decoding complexity where K-best detection algorithm suffers from two main problems which are the expansion and the sorting operations. K-best algorithm expands each K retained paths to its K possible children at each level. The previous step requires sorting the children in each layer before selecting the best K paths. Therefore, its decoding complexity increases exponentially with the increase of the value K where a high decoding complexity is required to enumerate the children nodes especially in the case of large number of transmit antennas and high constellation sizes as shown in Table I, while the complexity of the proposed decoder increases linearly with the value of N o rand. It can be observed from Figures 3, 4, 5 and Table 2 that the proposed decoder using only 200 iterations achieves almost the same performance as the optimal ML decoder which requires 64 4 = 16777216 iterations. From Table 1, it is observed for low constellation sizes, that the complexity of the proposed decoder is similar to that of the costly ML decoder. This is also the case if N o rand = M nT x . However, the value exponential growth of M nT x with the increase of the number of transmit antennas and the constellation size is generally much larger than the corresponding growth rate of N o rand required to achieve similar performance. Particularly for a large constellation size and a large number of transmit antennas as, e.g., in the case 64-QAM constellations and nT x = 4 transmit antennas as shown in Table 2, the performance of the proposed decoder using only N o rand = 200 iterations enjoys similar performance as the optimal ML decoder.   EFFECT OF NOISE VARIANCE In this section, let us compare the robustness of the proposed decoder with respect to a mismatch between the true and the noise variance at the receiver. In the simulations, let us consider a MIMO system with four transmit and four receive antennas and a true noise variance of 0-dB. We assume that the SNR at the receiver amounts to SN R = P t /σ 2 = 12dB in Figure 7 and SN R = P t /σ 2 = 17dB in Figure 8, and the estimated (presumed) SNR at the receiver side is varied between 0-dB to 20-dB. From Figures 7 and 8, it is observed that for a number of randomization instances exceeding N rand = 20 the performance of the proposed algorithm in terms of BER remains approximately constant as the estimated receive SNR is varied across the entire range considered in the simulations. This shows that the proposed algorithm is fairly robust with respect to a mismatch in the noise variance or SNR estimation. This is due to the idea of the proposed decoder which depends on generating random vectors. These random vectors could be far away from the optimal one, however, there is a high probability that some of them will lie very close to it even if the variance of the noise is changed. Figure 7. Robustness of the proposed decoder to a mismatch between the true SNR value (P t /σ 2 = 12dB) and the presumed SNR value in a 4 × 4 system using 4-QAM

Int J Elec & Comp Eng
ISSN: 2088-8708 Ì 4145 Figure 8. Robustness of the proposed decoder to a mismatch between the true SNR value (P t /σ 2 = 17dB) and the presumed SNR value in a 4 × 4 system using 16-QAM

CONCLUSION
The proposed decoder appears to find the optimal compromise between competing interests encountered in the implementation of advanced MIMO detectors in practical hardware systems. The proposed detector exhibits a number of desirable properties such as: i) deterministic latency where the proposed decoder exhibits configurable and fully deterministic decoding complexity, which offers the benefit of a fixed decoding complexity, ii) full parameterizable performance/complexity tradeoff where the modification of the number of randomization instances used in the proposed decoder allows to balance at runtime the tradeoff between performance and computational complexity, iii) simple implementation where the proposed algorithm enjoys simple implementation with a minimum requirement of control structures and the proposed detector allows a high degree of parallelization, iv) extension to soft-bit output where the proposed decoder can naturally be extended to create soft-bit outputs as required in modern cellular communication standards.