Improving the data recovery for short length LT codes

ABSTRACT


INTRODUCTION
The word fountain describes the nature of these codes. An infinite stream of coded packets is sent via the channel without information about the channel conditions. With such way of coding the rate of the code is not fixed and is set while processing the received packets. Two codes represent the first practical fountain codes, Luby transform (LT) [1], and Raptor codes [2]. These codes are designed for erasure channels which fit the nature of internet communication systems [3]. fountain like codes can decode all source packets when sufficient number of encoded packets are received [1]. For the encoding of LT codes first, a degree is randomly chosen from a known degree distribution. Then, source packets are uniformly chosen at random to be XORed to form the coded packet. This procedure continues until an acknowledgement is received which declared complete recovery of the information packets.
The universal shape of the coded packets helps the receiver to gather any coded packets. With each coded packet, the receiver has the knowledge about the dgree,indicies of the data packets that forming the coded packet. The means for sending such data are explained in [1]. In Figure 1, an illustration for a bipartite graphical representation of LT encoding and decoding operations. In this prototype code, the data length of , and code-word length of with packet size of one-bitlpacket is represented. Figure 1-a is devoted to the encoding operation whereas Figure 1-b up to Figure 1-g are used to illustrate the decoding operation. For the decoding operation to be successful, the existence of degree-one coded packet is vital. At every stage of the decoding operation this degree-one coded packet has to be found. Once a degree-one coded packet is found then its connecting information packet is recovered. Secondly, the recovered value updates the values of the neighbor coded packets. Finally, the connections of 1973 the recovered information packet were released. Moreover, this procedure is repeated until all the information packets are recovered. It is obvious from understanding the decoding process, that the generation of degrees is the core for successful data recovery. Many attempts are presented to modify the conventional scheme of degree generations. The first design for degree distribution used was the ideal soliton distribution (ISD) and robust soliton distribution (RSD) [1]. The features of the RSD are suitable for bulk data files [1,2,4]. Esa Hyytiä et al [5] used a sampling approach to determine the optimized degree distribution. They found that for large information sequences, i.e., thousands of packets, the RSD shows good performance. In [6] the authors investigated the optimization criteria of degree distributions for small file sizes. They concluded that, in a well-design distribution just a few parameters require to be controlled to insure almost maximal performance. Chen Z and Zhou Q [7], suggested an LT code with a revised RSD used for degree generation based on using Kent chaotic map and pseudo-random number generator. Their idea based on joining both of the probability functions but with no attention for the parameter. The result of this new degree generation is shown in having more small degree values. C.M. Chen et al in [8] used a covariance matrix adaptation as evolution strategy (CMA-ES) to enhance the degree values for better performance.
Their approach for the new distribution generates the degrees adaptively without changing the main parameters. In [9] the author presents another attempt to improve the ability of degree generation to get better encoding-decoding frame. They enforce the distribution to have an expected number of degree-one coded packets as a division of data length. A state of the art work on fountain codes was done in [10] where a memory-based rateless encoding method is proposed. In this new data allocation method, the degree one coded symbol is isolated and connected to the data packets with the highest connection. The previous literature review gives a bright vision for the concern to modify the RSD as an optimal degree distribution. As a new attempt in this paper, a deterministic manner for both, the generation of the degrees and the choice of the data packets which form the coded packets is introduced. In the next section a review for degree generations is presented. Section 3 is dedicated for our deterministic degree generation. Testing the ability of this new degree generation to enhance the code performance is achieved in section 4. The last section is used to conclude the whole idea in few condensed lines.

DEGREE GENERATION APPROACHES
LT code design mainly depend on the quality of the degree generation method. The design has two features, the degree distribution and the way of data selection. The degree generation in all conventional approaches for LT code design are using random degree generation and also the data selection is applied randomly. After the introduction of ideal soliton distribution and robust soliton distribution by Luby in [1] many other studies [3,[5][6][7][8][9] on RSD were performed. And all these studies focused on improving RSD, i.e., on improving the right-hand side distribution. While in [10] Khaled et al introduced a memory based rateless encoding operation yielding non-uniform left hand side distribution which shows better performance in terms of BER. In this paper, we propose a new approach for rateless encoding operation. The proposed approach is based on deterministic construction of the generator matrix with non-uniform left hand and right hand side distributions. Below we discuss, the degree generation methods that been used as a comparing tool for our proposed one.

Ideal soliton distribution
This distribution offers an ideal distribution but not practical, because at certain point in the decoding process a degree one coded symbol could be lost with high probability. It is defined as: where is an integer represent the degree and is the length of information message.

Robust soliton distribution
In this modified degree distribution, the average number of degree one coded packets has been increased [1]. The robust soliton distribution used the in (1) to construct its new distribution as: where the normalization factor ∑ and is given as: where √ is the average number of coded packets with degree one. The constant has the value between 0 and 1, while represents the probability of decoder failure.

Non-uniform data selection (NUDS)
Approximately all the previous researches worked on improving the right-hand distribution. However, in [10] a memory based encoding approach, resulting in better performance with the help of memory tracking for the data packets degrees. Below, the algorithm proposed in [10] is outlined. Algorithm 1 (NUDS) 1: Repeat 2: Generate a degree using RSD. 3: if 4: Select the data symbol with largest degree, i.e., select the data symbol with the most number of connections. 5: else 6: Select data packets uniformly. 7: the selected data symbols to form the code symbol 8: end if 9: Until (Acknowledgement signal is received destination).

DETERMINISTIC DEGREE GENERATION (DDG)
RSD is considered as an optimal degree distribution for large file sizes, but it may suffer from certain drawbacks when dealing with short size data files [11]. The reason for that is the amount of extra coded packets that have to be collected to insure successful decoding. This extra needed packet for LT codes has been approximated to be , while it could be enlarged for small message sizes to reach , [12]. Sevral attempts have been presented in the field of improving the communication systems using channel coding [13][14][15][16][17][18]. In addition, motivating from the decoding treatment which has done by the pattern recognition (PR) [19][20][21][22][23][24][25] to resume decoding when the decoder halt, and the work introduced by [10] for the NUDS, we propose a deterministic encoding (DDG) approach for LT like codes. The proposed approach is designed to be used especially for short frame lengths. In our deterministic degree generation, in case of having data packets, the degrees are generated following this formula: Where is the repetition period value chosen to be between , m is an integer take the values and is the size of the code length. For the information sequence consisting of symbols, i.e., for , with ( ) the coded symbols are formed using: It is clear from (5) that the generator matrix is , however the extra needed coded packets are generated using (4) with repetition of the same allocation of the data symbols illustrated in (5). More immunity for such encoding scheme in case of bad channel conditions and in order to prevent losing the same data packet, a Time-Hopping-Encoding scheme is used to produce a Time-Hopping-encoding scheme. In this scheme, the encoding matrix is altered between two shapes, one is the original deterministic and the other is the random shuffle of it , as shown in Figure 2:

DECODING IN THE ABSENCE OF DEGREE-ONE CODE PACKET
LT-like codes suffer from the absence of degree-one coded packets during the decoding operation especially for short length messages. If degree-one coded packet is not found at any stage of the decoding operation, decoding failure is declared. Our work in [19], succeeded in breaking the decoder failure in a way that even there is no degree-one coded symbol decoding operation can be continued by fetching for certain configuration patterns inside the remaining coded symbols matrix. For this Pattern Recognition (PR) proposed approach, assume that there is no degree-one coded symbol at any stage of the decoding operation but there are two code-words and which are formed as: (6) where it is obvious that the degree of the code-words can be reduced to '1' using: (7) which result is . Getting motivation from this approach, we added further fetching steps by trying to reduce the degree of the coded symbols which are contained in each other, algorithm 3 illustrate this approach, [19]: Algorithm 3 Decoding Stuck Removal Approach 1: for iteration number=1: last % last may be any number (2, 3, 4) 2: Repeat 3: 4: 5: for any other code-words other than 6: if (deg ( ) equal 1, then 7: ; flag 8: break; resume BP decoding 9: end if 10:end-for 11: ,return to step 4 12: if flag success repeat steps (4-10) to decrease the degrees and replace step 6 to (deg( )<deg ( )) 13: end if 14: until (All degrees are tried.) 15: end-for

RESULTS AND ANALYSIS
The LT code performance using BP algorithm is compared in an erasure channel environment using three types of degree distributions which are DDG, RSD and NUDS. For the RSD the parameters are chosen as , . The message length is chosen as 32 packets (the packet length could be with any length, in our simulation it is represented by 1 bit). For the binary erasure channel erasure probability is employed. Computer simulations are performed such that for each rate point on the graph ( is checked until 100 erroneous frames are received. It is seen in Figure 3 that the proposed method is better than that of the LT-RSD and NUDS in terms of the number of unrecovered data packets for all code rates.
The performance improvement for LT code is obvious for all distribution types under study with the proposed decoding stuck removal algorithm BP-PR as shown in Figure 4. It is clear from Figure 4 that the performance improvement for LT-RSD and NUDS is more compared to that on DDG. However, DDG still gets the best performance. Another performance comparison graph is given in Figure 5 where success rate is defined as: It is clear from Figure 5 that DDG approach recovers all the information frames at a rate of while the other distributons have approximately the same performance and they do need to collect more overhead to reach the achievement of our DDG.

CONCLUSION
In this paper we proposed a time hopping deterministic degree generation encoding approach for the forming of LT-like codes especially for small data files. It is shown via simulation results that the proposed approach is better in terms of minimum number of unrecovered packets and overhead amount required for full recovery of data packets than that of the classical LT-RSD and NUDS. The flexibility features of the proposed approach support the design to overcome the problem of extra needed overhead for the case of short length LT codes. For the decoding operation BP algorithm is employed supported by the enhancement algorithm of pattern recognition (BP-PR). It has been approved that BP-PR has little improvement to the proposed DDG-LT code compared to the clear improvement shown for the classical RSD and NUDS-LT codes.