An improvement and a fast DSP Implementation of the bit flipping algorithms for low density parity check decoder

For low density parity check (LDPC) decoding, hard-decision algorithms are sometimes more suitable than the soft-decision ones. Particularly in the high throughput and high speed applications. However, there exists a considerable gap in performances between these two classes of algorithms in favor of soft-decision algorithms.  In order to reduce this gap, in this work we introduce two new improved versions of the hard-decision algorithms, the adaptative gradient descent bit-flipping (AGDBF) and adaptative reliability ratio weighted GDBF (ARRWGDBF).  An adaptative weighting and correction factor is introduced in each case to improve the performances of the two algorithms allowing an important gain of bit error rate. As a second contribution of this work a real time implementation of the proposed solutions on a digital signal processors (DSP) is performed in order to optimize and improve the performance of these new approchs. The results of numerical simulations and DSP implementation reveal a faster convergence with a low processing time and a reduction in consumed memory resources when compared to soft-decision algorithms. For the irregular LDPC code, our approachs achieves gains of 0.25 and 0.15 dB respectively for the AGDBF and ARRWGDBF algorithms.


INTRODUCTION
In digital transmissions, there is an extraordinary rise in throughput demand in order to respond the various multimedia uses increasingly favoered by users who want universal connections. More for used mobile radio systems, the information is frequently disturbed by noise in the transmission channel. Thus, a high-performance error-correcting code is essential and obviously vital for digital transmissions making a development of high-performance decoders with low latency, high working frequency and high throughput a challenged research problem [1], [2].
Low density parity check (LDPC) codes, introduced in 1962 by Gallager [3], have an important correction power that makes them very attractive for use on highly disturbed channels. Due to their capacity to error correction performances, the LDPC codes are widely used in many communication systems and standards, such digital video broadcasting-satellite-second generation (DVB-S2), IEEE 802.16e (WiMAX), IEEE 802.11n (Wi-Fi), and 5G [4]- [6]. LDPC codes use a binary sparse parity check matrix H and for their, decoding procedure, the hard-decision and soft-decision algorithms are the two main usually used algorithms. The soft-decision algorithms calculate the extrinsic log likelihood ratio (LLRs) to evaluate the reliability of received messages ( n), these methods achieve the best bit error rate (BER) performances [7]- [9], but these iterative decoding algorithms require large number of arithmetic operations; and can introduce prohibitive delays for very high-speed transmissions where latency plays an important role. Alternatively, for harddecision algorithms (or bit flipping algorithms) the decoding time can be strongly reduced with some relative loss in performances [10]. Even that, they still helpful for some applications where the speed and high throughput are needed, more particularly if one finds a way to reduce the performances gap between the two classes of algorithms [11]. These hard-decision algorithms have been introduced to address three important problems: BER performance, latency issues, and computational complexity. These issues can be seen as a trade-off problem where the challenge is to optimize all of them under specific requirements. This type of algorithm propsed by Gallager [3] simplifies the decoding method by taking a hard-decision ( n) on the message received from the transmission channel ( n) at the beginning of the decoding process. It first calculates the sum of the syndromes ∑ ∏ ∈ ( ) , if this sum is equal to the number of lines in the matrix H, it stops the decoding, otherwise it calculates the inversion function ( ) ≜ ∑ ∏ ∈ ( ) ∈ ( ) that estimates the reliability of received channel messages; and the bit which corresponds to the minimum of this function will be switched. This bit flipping (BF) algorithm has very low complexity since it requires, in each iteration, only a simple summation over binary parity-check values for each bit. However, this method provides poor decoding performance, for instance three or more orders when compared of the soft-decision algorithm for an SNR of 3.5 dB. To overcome these problems, the hard-decision algorithms have been largely investigated and numerous variants of BF algorithms has been proposed. Among which the weighted BF (WBF) algorithm [12], the modified weighted BF (MWBF) algorithm [13], gradient descent bit-flipping GDBF [11] and reliability ratio weighted GDBF (RRWGDBF) [14]. These works use an additive or multiplicative weighting factors in Δk(x) to evaluate the reliability of syndromes [15], [16].
Kou et al. [12] proposed the WBF algorithm in which a weighting factor based on the minimum value of n is considered in the syndromes calculation to make ( ) more reliable. This process increases the complexity and the number of iterations even if some improvements in performances, in term of BER, have been achieved. A modified version (MWBF) has been introduced by Zhang et al. [13] who added an offset value, based on the absolute value n, in ( ) of the WBF algorithm. This algorithm, even it leads to some signal quality enhancement it involves a slight increase in complexity of Δk(x). Another improved MWBF (IMWBF) version was introduced by Jiang et al. [15] which offered further improvement by using a weighting factor aiming to avoid the SNR dependency. This new weighting factor can be determinated via Monte Carlo simulations. Always in order to enhance the error rate performances, Wadayama et al. [11] suggested the concurrent GDBF algorithm as a gradient-descent optimization model for the maximum likelihood decoding problem. This algorithm adds in ( ) a relation between the message after the hard decision and received channel message allowing a maximum of correlation, then searches for the minimum value of ( ), and finally flips the corresponding bits. To improve the decoding performance of this GDBF algorithm another version called reliability-ratio weighted GDBF (RRWGDBF) algorithm has been proposed by Phromsa-ard et al. [14] that uses a weighted summation over syndrome components with an adaptive threshold to obtain reduced latency. The GDBF and RRWGDBF algorithms are methods that gives better trade-off between performance and complexity among all hard-decision algorithms [11], [14], [17], but when compared, for instance, to the min sum (MS) algorithm, which is a soft-decision algorithm, these two algorithms show relatively limited performances in term of BER [18]. Their main advantage is the simplicity of their hardware implementation compared to the MS algorithm for instance, which needs high material resources and increases the decoding latency. Several researchers proposed alternative GDBF algorithms to improve quality, but these algorithms require more than hundred iterations to converge toward best performances. Even that, the GDBF algorithm outperformed the WBF and MWBF algorithms in error correcting ability and more significantly in the average number of iterations. Nevertheless, during the decoding with GDBF algorithm, there is a risk of flipping some correct bits and again flipping them at another times in the next iterations, which causes a performance degradation with additional delays.
Thus, in this work we propose the adaptative GDBF (AGDBF) algorithm where a solution to solve this problem is developped. By following the bits flipping procedure, when a twice flipped bit is detected, we stop the flipping of this bit by adding a multiplicative weighting coefficient . In the same framework of the decoding improvement, we also propose an adaptative RRWGDBF (ARRWGDBF) algorithm, this time by first using a pre-processing step to check the columns of the short cycles in the H matrix and finally using a weighting correction factor to eliminate the impact of these short cycles. By theses ways, these algorithms allow better performances as hard-decision algorithms making them useful for high-speed applications. After being validated by simulations, the proposed algorithms are hardware implementation on a digital signal processors (DSP) platform in order to improve their performances and to reduce the processing time of these new approchs, as a second contribution of this work. The rest of the paper is structured as follows, an overview of the approaches of the GDBF and RRWGDBF algorithms is presented in the section 2 that permit a hardware implementation of the simplified LDPC decoder. In the next section we announce our new approach for these two algorithms. Finally, the DSP implementation results will be presented.

RESEARCH METHOD 2.1. Decoding algorithms
The LDPC codes use a binary sparse m×n parity check matrix H, where m=n-k, k being the information length and n the code length. The H matrix can be represented by a conventional Tanner graph as illustrated in Figure 1, where m represents the check nodes (CNs), and n represent the variable nodes (VNs). Each variable node vi is connected to a set of check nodes and each check node ci is connected to a set of variable nodes. M(n) denotes the set of check nodes connected to an involved n th variable node and N(m) the set of variable nodes that participate in m th check node.
For the communication systems and the standards, it is known that an optimized irregular LDPC code has better performance than a regular LDPC code [19], besides, the quasi cyclic LDPC (QC-LDPC) codes showed good performances for large codeword length [20]- [22]. The decoding complexity is proportional to / , where is the number of links between the check nodes and the variable nodes, and =k/n is the code rate [23]. For the present work, we assume an additive white Gaussian noise (AWGN) channel with a variance σ 2 =N0/2, where N0 is the spectral power density, and binary phase-shift keying (BPSK) modulation [24].

Soft-decision decoding
The MS algorithm is the simplest way to implement the Soft-decision algorithms [18], [25]. It is mainly based on the calculation of extrinsic LLR messages exchanged between the check nodes and the variable nodes of the Tanner graph. This algorithm achieves very high performances in terms of BER [26], but his major disadvantage is the implementation which needs more material resources and consumes more time resulting in a decoding latency increase.

Hard-decision decoding
Soft-decision algorithms calculate the LLRs to evaluate the reliability of received messages, this calculation is more complexe. To overcome this constraint, Gallager [3] has proposed BF algorithm that works in hard-decisions. This type of algorithm simplifies the decoding method by a hard-decision of the message received from the transmission channel at the beginning of the decoding process, and the algorithm calculates the sum of the syndromes per line. If this syndrome is equal to the number of lines in the matrix H, it stops the decoding, otherwise it calculates the inversion function which allows to define the false bits to be inverted. The basic version of BF algorithm [3] is defined in Figure 2.
Compared to soft-decision algorithms, this algorithm searches the minimum value of the Δk function to flip the corresponding bits, so it inverts several bits in the same iteration, which makes the BF algorithm the simplest to implement among all the inversion methods. But inverting several bits at the same time can lead to generation of new errors and finally the decoder cannot detect and correct all the errors. As a consequence, the performances of this algorithm still very far from those obtained by the soft-decision algorithms [15], [16].
To overcome these problems, some previous works use an additive or multiplicative weighting in Δk(x) to evaluate the reliability of syndromes. By this way, it can be easy to detect and correct almost all the errors as confirmed for instance by Jinag et al. [21] and Gua et al. [16].

GDBF and RRWGDBF algorithms
The GDBF algorithm is a method that gives a better trade-off between performance and complexity among all hard-decision algorithms [11]. It becomes a viable alternative to the belief propagation (BP) algorithm. In the GDBF algorithm, one must find the code-word that gives the maximum correlation value. The function to be optimized is defined by (2).
For a correct code word, the ( ) function achieves its maximum value. One then has to check to maximize this function by changing the values of k. The inversion function, defined by (3), of this algorithm gives the metric for each individual bit that lead to take a decision to flip or not the corresponding bit.
Another algorithm named RRWGDBF has been proposed to improve the GDBF algorithms [14]. This algorithm further increases the convergence speed of the BF algorithm by adding the multiplicative weighting factor, β, in the syndromes, the new metric is then given by (4). Where: Before going further, we undertook to evaluate the BER performances of the above cited algorithms. For that we performed a simple comparison in the case of n=576. Results are shown on Figure 3. It can be observed that the performances of these two last inversion algorithms are better than the BF algorithm but still far from the BER of the MS procedure. Thus, to reduce the gap between these two classes of algorithms, we introduce in the following, two new approaches to improve the BER performances of both the GDBF and the RRWGDBF decoding algorithms by introducing weighting factors to make more reliable their inversion functions.

Proposed decoding method
As a test experimentation we will focus on the decoding process for the WiMAX standard. Therefore, for the GDBF and RRWGDBF algorithms and for an easier implementation system, we will consider two matrix H based on an irregular QC-LDPC code of codeword length 576 and 1056 [27].

Proposed AGDBF approach
The GDBF algorithm searches for the minimum value of the inversion function, then flips the corresponding bits. During decoding, there is a risk of flipping some correct bits and flipping them again at other times in the next iterations, which induces some performances degradation with an additional delay. Thus, to solve this problem, we introduce a new weighting coefficient, which adjust the values between k k and the syndrome ∑ ∏ ∈N(i) . The key of this proposal is to follow the bit flipping, and if we detect that a bit is flipped twice (Nk = 2), we stop its flipping procedure by multiplying the first term in the inversion function by a weighting factor  in order to increase its value, therefore, it will not be affected by the flipping next times. And the inversion function becomes:

Proposed ARRWGDBF approach
The codes (576,384) and (1056,792) give the best performance for the RRWGDBF algorithm, but the H matrix will not be very sparse, so there will be the presence of short cycles in the H matrix. A cycle starts from a given variable node and shows all the parity and variable nodes to which it will be connected falling back on the starting variable node. Figure 4 illustrate some examples of short cycles of order 4, order 6 and order 8 as shown in Figures 4(a)-(c) usually encountered in H matrix.
To improve the performance of the RRWGDBF algorithm, it is necessary to avoid the generation of short cycles in the Tanner graph, In fact short cycles are very penalizing when calculating the inversion function. The "girth" is the minimum cycle length that can be encountered in a Tanner graph. With the appearance of cycles, the result of the sum of the syndromes ∑ ∏ ∈N(i)

∈M(k)
will not be reliable, which decreases the performance of the decoder. Therefore, it is essential to eliminate short cycles to obtain good decoding performance. To overcome the problem of the speed convergence for the RRWGDBF algorithm, we suggest in this work to introduce a pre-processing step to search the columns of the short cycles in the H matrix in order to identify them and then to multiply them by a reweighting factor to obtain a gain of bit error rate performances. To identify the columns of short cycles we followed the same method of Yang et al. [28] and we found that columns between 265 and 312 for H(192, 576) matrix and columns between 1 and 727 for H(264, 1056) matrix are the columns that present short cycles of order 4. The multiplying factor is then introduced in the first term of the inversion function which is the calaculated for n=576 by the following (7) and (8)

Software validation
The different hard-decision algorithms with the proposed algorithms were coded using the C/C++ programing language. And for simulation, we used the host computer of Intel Core i7 7500U, 2.7 GHz.

Weighting factors determination
For the AGDBF, as the aim is to increase the value of the inversion function when a flipped bit is detected, we permformed a series of simulation using a set of arbitrary positives values of . Results for the representative ones (2, 5, 10, 20, 50) are presented in Figure 5 and Figure 6 for the two codes (576,384) and (1056,792) respectively. The value of 5 is the minimal value where an appreciable gain can measure and the value of 20 is the higher value from which the observed changes still negligible and even not measurable.
For the ARRWGDBF, we follow the same procedure and representative results for  in the list (2,5,10,20) illustrated in Figures 7 and 8, respectively for n=576 and n=1056. Values of =5 and =10 can be assigned to the check nodes that are crossed by the short cycles of order 4, for these two codeword lenghts. In fact, for higher than these values the changes in the BER becomes very negligeables.  Figures 9 and 10 show respevctiveley the achieved decoding performances by the two proposed algorithms in the case of targeted codeword length 576 and 1056. The AGDBF algorithm gives approximately a gain of 0.2 dB at a BER=10 -4 for n=576 and approximately 0.17 dB gain at 4×10 -5 for n=1056 show in Figure 9, thus, the use of the weighting factor improves the decoding performance compared to the GDBF algorithm. For the ARRWGDBF algorithm a gain of approximately 0.14 dB is obtained at a BER=10 -3 for n=576 and a gain of approximately 0.17 dB at BER=4×10 -4 for n=1056 show in Figure 10, again, the pre-processing step in the H matrix and the use of the reweighting factor in the inversion function lead to non negligeable gain in the BER performances. Figure 9. BER performance of the GDBF and the proposed AGDBF algorithms Figure 10. BER performance of the RRWGDBF and the proposed ARRWGDBF algorithms

Simulation results
In Table 1, a comparison between the new and basic algorithms in term of BER is presented. It can be seen that the BER performance, between the GDBF [29] and the proposed AGDBF algorithms is improved and also the number of iterations is highly reduced, 30 iterations in our case instead of 60 in the previous version [29]. Idem, in the proposed ARRWGDBF algorithm the BER is improved.

Hardware implementation
DSP implementation of the proposed solutions is helpful to to evaluate the performances and the delays of algorithms. The DSP processor has a fast kernel that allows high speed memory accesses and it also can suggest some improvement ways. The platform used in this work is the Texas Instrument's TMS320C6713 floating point DSP processor [30]. The LDPC decoder has been developed in software on the Code Composer Studio Simulator using C/C++ programming language.
For applications that require a large codes length, to speed up the decoding process and optimize memory, we adopt the same method as we reported previously [31], we have stored just the positions of the 1s in the H matrix. This method simplifies the check for the 1s during decoding and decreases the time processing by reducing the number of memory access. Figure 11 illustrates the steps we followed to implement our algorithms: In the case of the code (576,384), Figure 12 shows that, for AGDBF algorithm, a gain of 0.25dB is obtained for a BER of 5×10 -5 and for ARRWGDBF algorithm, a gain of 0.15dB is obtained for a BER of 3×10 -4 . On Figure 13, (case of the (264,1056) code), for the AGDBF algorithm, a gain of 0.13dB is obtained for a BER of 5×10 -5 and for ARRWGDBF algorithm, a gain of 0.15dB is obtained for a BER of 3×10 -4 , illustrating the performance improvement of the decoding process in each case.  Table 2 shows a comparison of the processing time per iteration and the memory resources consumed for the MS, AGDBF and ARRWGDBF algorithms. As far as memory requirements are concerned, the sparse matrix implementation, storing just the positions of the 1's in the H matrix, saves a considerable amount of memory by allowing the entire code to reside only on the on-chip memory. The use of on-chip memory IRAM avoids accesses to slower off-chip memory, and the AGDBF and ARRWGDBF algorithms exploit less memory. On the other hand, the MS algorithm uses the on-chip and external memory SDRAM. Thus, the number of cycles per iteration of the proposed AGDBF and ARRWGDBF algorithms is much lowered compared to the MS algorithm. Therefore, the system can implement a large codes length based on the proposed algorithms. In the same table, it can also be seen that the number of cycles and memory allocation, between the GDBF and AGDBF algorithms from one part and between the RRWGDBF and ARRWGDBF algorithms from the other part, are almost unchanged.