FPGA implementation of LDPC soft-decision decoders based DCSK for spread spectrum applications

Spread spectrum (SS) communications have attracted interest because of their channel attenuation immunity and low intercept potential. Apart from some extra features such as basic transceiver structures, chaotic communication would be the analog alternative to digital SS systems. Differential chaos shift keying (DCSK) systems, non-periodic and random characteristics among chaos carriers as well as their interaction with soft data are designed based on low-density parity-check (LDPC) codes in this brief. Because of simple structure, and glorious ability to correct errors. Using the Xilinx kintex7 FPGA development kit, we investigate the hardware performance and resource requirement tendencies of the DCSK communication system based on LDPC decoding algorithms (Prob. Domain, Log Domain and Min-Sum) over AWGN channel. The results indicate that the proposed system model has substantial improvements in the performance of the bit error rate (BER) and the real-time process. The Min-Sum decoder has relatively fewer FPGA resources than the other decoders. The implemented system will achieve 10-4 BER efficiency with 5 dB associate E b /N o as a coding gain.


INTRODUCTION
Being wide-band, chaotic signals are well suited for communication with the spread-spectrum [1]. Among the proposed digital schemes, the most comprehensive research was on chaos shift keying (CSK) and differential CSK (DCSK) [2]. The former was originally introduced for such a coherent implementation [3], requiring synchronized replicas of the chaotic base functions at the receiver. This requirement has still not been proved practical, though. On the other hand, the DCSK scheme represents a rather more robust noncoherent scheme [4] in which the receiver does not need the exact information of chaotic basis functions. Rather the two signal samples were correlated as well as the result of the correlation is compared with a threshold. The performance of digital communication systems based on chaos under an additive white gaussian noise (AWGN) environment has been studied thoroughly [5]- [8].
Low-density parity-check (LDPC) is one of the most effective techniques amongst the error correction codes. LDPC codes have gained a lot of attention recently since they can achieve exceptional performance close to the Shannon limit over the binary symmetric channel (BSC) and also the additive white gaussian noise (AWGN) channel [9]. Decoding an LDPC code allows for a high degree of parallelism, making it ideally suited for high data rate applications including wide-band wireless multimedia communications and magnetic storage systems. The parity check matrix's low-density nature thus contributes to both great distance properties and the relatively low complexity of its decoding algorithm [10], [11]. Moreover, the excessive noise derives from a wide scale of mobile communication, the bit error rate (BER) is acceptable in modern communication with an even more high-speed data rate. Hence, support for the DCSK communication system through LDPC codes becomes essential to mitigate the high error [12]. The field programmable gate array (FPGA) is used to evaluate a system that included each DCSK and LDPC codes in a real-time environment. Gallager 's suggested basic decoding algorithm for LDPC codes in 1962 was soft decisions such as the sum-product algorithm (SPA) and hard decision such as the bit flipping (BF) [13]. SPA requires more steps to addition and multiplication which enables implementation in the case of real-time implementation with FPGA due to the simplicity of this operation. A crucial trade-off between 'complexity' and 'efficiency' is needed in iterative decoding. In this paper, we propose a model of a communication system including a DCSK as a modulation technique supported by three SPA (Prob. Domain, Log Domain and Min-Sum which is designed to reduce hardware complexity) to improve the system performance. The proposed system will be implemented using an FPGA Kintex 7 development kit integrated with Vivado 2017.4 software.

DCSK WITH LDPC ENCODED COMMUNICATION SYSTEM
The proposed system diagram is shown in Figure 1 that included a binary data source that will be encoded through the LDPC block with a code rate of 0.5. The codeword is sent to a DCSK modulator that uses a chaotic carrier for spread the digital signal across a wide frequency band to achieve a modulated signal with such a spreading factor value equal to 16. AWGN noise is indicated with the modulated signal which is mostly used for the experimental applications for simplicity. Non-coherent demodulation will then be adapted on the receiver side to recover the received code-word, that will be decoded to regenerate the original information.

HARDWARE COMMUNICATION SYSTEM DESIGN
As mentioned in section 2 the system consists of three main parts: transmitter, channel and receiver. A transmitter section is established by the subsequent, Logistic-map as input chaotic data generator, LDPC encoder by using the systematic form of H matrix and computing the parity check equation for each row of this matrix and DCSK modulator. The receiver is established by the subsequent, AWGN noise, DCSK demodulator and LDPC decoder with three types of decoder algorithms such as; Prob. Domain, Log Domain and Min-Sum algorithms. The system model will be implemented using Xilinx SG as shown in Figure 2, each block is designed with specific parameters to match the overall system implementation as will be explained in detail individually. It is worth to mention that all the three decoder algorithms have the same design architecture except the part deal with decoder block.

LDPC encoder
The Bernoulli random binary number generated message word with a length of k=10 was fed to the LDPC block through gateway block to obtain a code word with a length of n=20 at the output of a certain block. There is a serial to parallel block within the LDPC encoder block shown in Figure 3 which will be used to convert a group of samples serially presented at the input to single samples present at the output is shown in Figure 4. In this paper, we have = 10 as mentioned previously represented by 1 − 10 in (1) while 11 − 20 represented the redundancy bits ( = − ) [14].
The parity bits are incorporated using (2) and (3) which reflect the first and second parity bits corresponds respectively to first and second rows of the (4) matrix. The remaining parity is generated in a  [15]. Figure 5 illustrates the hardware implementation of these equations that are implemented using 10 XOR gates to generate 10 parity check bits, based on the input message. 11 = 3 ⊕ 6 ⊕ 9 (2) 12 = 5 ⊕ 6 ⊕ 7 ⊕ 8 Now, the 20-bit code-word is supplied to concat block, concatenating two or more bits of inputs to get a symbol of these bits in the output. Eventually, the parallel to serial block induces the code word as output block. So that each sample provided in this block's input became multiple samples displayed serially in the output.

DCSK modulator
The code-word generated by LDPC was fed to a Xilinx SG DCSK modulator designed. Throughout the DCSK modulator, each bit Si is expressed by 2 sets of β-period chaotic signal samples where 2β refers to the spreading factor with such an integer number. The first and the second sets respectively represented the segment of reference and the segment of data. Samples from the data segment are equal to the reference segment for transmission +1, and an inverted variant of a reference segment is being used for transmission of bit -1. The transmitter ek 's output during the i th bit period is [16]- [18]. Figure 6 Shows the Xilinx SG block of DCSK modulation system with equal to 8.
Both Chaos generator and mapping blocks are shown respectively in Figures 7 and 8. Chebyshev polynomial function (CPF) of order two as in (6) is selected as a chaotic generator with such an initial condition value of 0.1 to be implemented simply using a multiplexer, multiplier and adder blocks:

Channel
If the modulated signal is transmitted over a channel, noise in the channel represented by additive white gaussian noise (AWGN), , can corrupt it. Such noise characterized by a wide frequency spectrum, which is statistically random radio noise. The hardware-implemented for these channels is illustrated with seed value 512 in Figure 9.

DCSK demodulation
To convert the 16 serial to parallel samples, the obtained signal passes to an S2P block to despread within a single bit. Figure 10 illustrates the details of the Xilinx SG S2P block consisting of 16 latches and its corresponding delay blocks. S2P 's 16 output samples are linked to a correlator, in which the reference samples as well as the corresponding information samples were correlated using multipliers. i th correlator output is the variable i th decision, Di [19]. The i th bit is demodulated using a zero threshold, measuring the sign of the final correlator output. The DCSK demodulator Xilinx SG is shown in Figure 11.

Decoders
Prob. Domain, Log Domain and Min-Sum algorithms represent a type of soft decision algorithms and it works by passing messages between the CNs and the variable nodes (VNs). It is used to correct the received bits to obtain 10-bit symbols which will represent the original signal after the decoding process. The demodulated bits are firstly fed to the S2P converter block to match each other within SG blocks. The S2P will be used to convert the 20-stream of bits to parallel bits to be initialized. In general, these algorithms perform in main three steps. − Initialization − Horizontal Step − Vertical Step and Decoding/Estimation The step details for each type of these algorithms will be described with a zoom-in view below.  (1) and P(0)= , (0) is obtained from (9) & (10) which represents the message that will be sent from the VN n to CN m. Each one of the messages has the pair , (0) as well as , (1) referring to the indication that rx has been one or zero [20].
Where (1)and (0) are the posterior probabilities and their values can be found based on the received signal, also N0 representing noise variance. Horizontal step: The horizontal step calculation is created depending on the number of 1s over all columns of ten rows throughout the matrix (11)-(13). , These variable node has been designed by using the basic add-sub and mult Xilinx SG blocks to evaluate the by taking the product of subtraction , ′ (1) also , ′ (0) excluding bit ′, , (0) as well as , (1) which representing the probability which check m can be designed easily by using add-sub to add or sub one from value then mux the results by 0.5 using Xilinx SG block. Figures 12 and 13 illustrate the first row computations according to the (10)-(12) of the horizontal step.  (1) Where ki n (0) and ki n (1) are scales factors which are implemented using Mult Xilinx SG blocks. Also, a decision for each bit is made according to (15).
The Xilinx SG block of column twelve is described in Figure 14.

Log domain algorithm
Initialization: In Log Domain decoder the log-likelihood ratio (LLR) of prior (which represents the receiving messages from the channel) and posterior (which represents the medial messages moved between CNs and VNs) probability is used. So, the prior messages sent from BN n to the CN m represent the LLR [21].
This process is applied to 20 received bits serially. Then, the absolute value and sign value (will calculate the sign value whether it is positive or negative or equal to zero) will be calculated for each bit. The implementation of the initialization block is done by using Mult, Mux and relational Xilinx SG block. The process of initialization block and the sign block will be declared in Figures 15 and 16 respectively. Horizontal step: The computations of the horizontal step are created depend upon the amount of 1s per column of ten rows throughout the matrix. To Computing the extrinsic messages for each set of bits connected to CN m by excluding the bit ′ the equation below will be used.
Where , represents the probability that parity-check m has been achieved in the case when bit n is supposed to be a 1 for the LLR. Compute the summation of , excluding the bit n.
Then compute , : Get products of ℎ , excluding the bit n: Finally, compute To reduce the complexity of Xilinx SG design related to the logarithm (log) process in (19), a lookup table is proposed by taking the values of rx and applying (19) as shown in Table 1. Figure 17 shows the details of the Xilinx SG design for the Pi calculation according to (19) for each bit equal to one.

4803
For example, the Xilinx SG design for the first row will be implemented by using Mult and Add-Sub block according to (20)- (23) as illustrated in Figure 18. The details for each one of the blocks in Figure 18 can be declared in Figures 19-21. Furthermore, to reduce the complexity of Xilinx SG design related to the logarithm (log) process according to (21), a look-up table is proposed as illustrated in Figure 20 and Table 2.
Where represent the collective log-likelihood ratio for n' th digit. Also, a decision is made according to (25) for each bit. The design model is implemented using Xilinx SG blocks (Add-Sub, Mux, and Relational Xilinx blocks) to manage the value of the variable node. The details for the Xilinx SG block of the twelve columns are described in Figure 22.

Min-sum algorithm
The receiver side of the Min-Sum algorithm is very similar to the Log Domain algorithm except for the initialization process and horizontal step. It's implemented by replacing the implementation of (19)-(23) that belongs to the Log Domain initialization and horizontal step block by (26)-(29) respectively [22][23][24].
This process is implemented using Xilinx SG blocks (Multi, Mux, and Relational Xilinx blocks) to manage the value of the variable node as illustrated. The details are illustrated in Figures 23-28.

Down sample
Finally, the fully flexible LDPC decoder design output bits were connected to down-sample block from Xilinx SG blocks with a sampling rate equal to 320 which is used to control the rate of the output signal at the receiver. These bits from the output of the down-sample blocks will be converted from parallel bit to stream bits after passing through concat and P2S blocks respectively to get the information data as seen in Figure 29. Once the Simulink models are developed for the proposed system, the VHDL code can be generated automatically using the SG block.

SIMULATION RESULTS
The performances of codes obtained from the proposed system were measured by BER as a function of SNR. To complete the simulation with an optimum parameter was the No. of one in each column (w c )=3, No. of iteration (iter)=6, frame length (F)=100, code rate=1/2 and spreading factor (2 ) value used are equal to 16. The results illustrated in Figure 30 are clear-out the comparisons between these communication systems with and without LDPC code.
From previous results, it is clear that the performance of the hard decision BF decoders will be worse than the soft decision decoder [25]. Also, the results show that close performances are illustrated when using Prob. Domain, Log Domain and Min-Sum, but all give approximately the same amount of gain. The system needs more than 13 dB SNR to approach 10 -3 BER without code. However, using soft-decision decoder algorithms (Prob. Domain, Log Domain and Min-Sum) will improve the performance of the communication system at low and high SNR. All results are summarized in Table 3 for the spreading factor 16 at BER of 10 -3 .

FPGA SYNTHESIS RESULTS
Tables 4 show the hardware resources devices utilization required for (Prob. Domain, Log Domain and Min-Sum) algorithms respectively. To measure the amount of complexity for the overall system in each type of soft decision decoder, Table 5 summarized the resource device utilization for each one. From all the comparisons it is easy to conclude that the optimum decoding algorithms in the term of complexity and performance are the Min-Sum algorithms due to their BER performance and suitable resource utilization.  Figure 31 shows the Xilinx SG simulation test result among the Bernoulli binary transmitted signal and the received signal where all type of decoders is to record the original signal with 960 ns delay. The hardware Co-Simulation results of the user data information and recovered information for the proposed algorithms are shown in Figure 32 respectively where there is a delay due to the operation of the extraction process and there are some Xilinx blocks causes delay.

CONCLUSION
In this work, the DCSK communication system has been developed with LDPC codes by using Prob. Domain, Log Domain and Min-Sum decoding algorithms and implementation have been done on the Kintex 7 FPGA development kit using Xilinx SG tools. The simulation results show that Min-Sum is outperformed from other decoders were achieved 8.8 dB improvement gain while when using Prob. Domain and Log Domain the improvement gain is approximately the same amount as 9 dB for the 16 Spreading factors. With FPGA the results show that the Prob. Domain algorithm has the highest complexity in term of the resources utilization where its consumed 1% of the slice registers, 64% of DSP and 4% of LUT while Log Domain has consumed 3% of the slice registers, 27% of DSP and 41% of LUT. The consumption of the Min-Sum algorithm was 1% of the slice registers, 23% of DSP and 2% of LUT which means that the optimum decoding algorithms are the Min-Sum algorithm due to its BER performance and suitable resource utilization. The implementation of hardware confirms that the proposed system is suitable for future communications, especially in real-time applications.