An efficient hardware logarithm generator with modified quasi-symmetrical approach for digital signal processing

This paper presents a low-error, low-area FPGA-based hardware logarithm generator for digital signal processing systems which require high-speed, real time logarithm operations. The proposed logarithm generator employs the modified quasi-symmetrical approach for an efficient hardware implementation. The error analysis and implementation results are also presented and discussed. The achieved results show that the proposed approach can reduce the approximation error and hardware area compared with traditional methods


BINARY LOGARITHM HARDWARE APPROXIMATION
Firstly, without lost of generality, we consider an unsigned number N to compute the binary (base-2) logarithm and it can be decomposed as: where n can be determined by detecting the position of the most significant '1' bit of N in the binary representation. Moreover, the range of x, the fraction value, is defined as: 0 1 x  . Then, we can rewrite the binary logarithm expression as: 2 2 log log (1 ) N n x    (2) By using (2), to compute the binary logarithm of N, in the first step, we detect the most significant '1' bit in its binary representation. Then, we can approximate the function log2(1+x) which is the fractional part of the result. Here, log2(1 + x) is considered as the fundamental function. Therefore, many researchers are trying to find the efficient methods for the fundamental function approximation.
The simplest method for the fundamental function approximation was proposed by J. N. Mitchell [14] with very simple linear approximation as follows: This approximation approach is simple and leads to very fast and low complexity hardware implementation, with the tradeoff of the following absolute error function: Whose maximum value is 0.08639 resulting in the accuracy of only 3.53 bits which is too low for most of DSP applications. Therefore, many methods were developed to find error correction techniques for Mitchell's method. There are three commonly used methods to improve the accuracy of this approximation: LUT-based method, piece-wise linear interpolation method and combination method which combines two above methods. In the LUT-based method, a LUT (Look-up Table) that stores an approximation of the residual error is added to Mitchell's approximation to reduce Mitchell error. However, the Mitchell error function maximum value is very high, this method requires very high table size. Another approach is the multipartite method which was presented in [15]. In this method, tables and adders are utilized to reduce the table size significantly compared with the direct LUT based method. A method of using a LUT and a multiplier-less linear interpolation was proposed by S. Paul et al. [16]. It requires less memory than some other LUT-based methods with the same requirement of the accuracy.
In the approximation methods using piecewise linear approach, the range of x is divided into several regions. Then, in each region, EL is approximated by a linear function called a segment which can be expressed as: Increasing the number of segments can reduce the approximation error but lead to higher hardware complexity. Some methods for dividing the range of x into different regions were proposed in [1,[17][18][19][20][21][22][23][24]. Papers [17][18][19][20][21][22] presented the methods with 2, 4 and 6 regions with different values of slopes ai and constants bi. These values are chosen by "trial and error" method without detail optimization method. Figure 1 represents the error function and the linear approximation method using 4 segments and a small error LUT proposed in [22]. In [23,24], authors proposed the quasi-symmetrical method to reduce the hardware complexity and approximation error. Moreover, in [1], B.-G. Nam et al. proposed a method with the number of segment of 24 for the logarithmic approximation. However, these methods should be improved for the high accuracy applications. A method which combines the piecewise linear approximation method and LUT-based correction may be the most effective technique for logarithm approximation [21][22][23][24]. The basic idea is that after using the linear approximation, a LUT which is utilized to store an approximation of the error between the fundamental function log2(1+x) and the approximation function is added to linear approximation as described in Figure 2. Moreover, R. Gutierrez et al. [22] proposed a method using 4-region linear approximation using a 128×5-bit residual error LUT which outperforms previous methods. However, the coefficients used in this method, which were selected by "trial and error", may be not optimal. Therefore, the objective of this research is to find an improved approach by modifying the quasi-symmetrical method in [23] with an improved method with a modified optimization algorithm to find optimal coefficients of the piece-wise linear approximation.

PROPOSED APPROXIMATION METHOD AND IMPLMENTATION RESULTS
Firstly, consider the fundamental function F(x) = log2(1+x) which is represented in Figure 3. The graph line, which is slightly curved, is nearly a straight line. Therefore, it would be promisingly efficient if we use the piecewise linear approximation method for this function instead of EL. The full range is divided into 4 segments to so that a simple selection circuit can be used with an acceptable accuracy. The approximation can be expressed as: In which i ∈ {1, 2, 3, 4} Moreover, the slopes ai are chosen to be sum of power-of-two values (2 k ) so that we can implement the multiplications by simple shifting operations. Then, the error function causing by this approximation method can be expressed as: An LUT is used to store the optimized values of the error function E(x). Then, the LUT output is added to the 4-segment linear approximation function to further reduce the residual error. The higher LUT size, the higher accuracy level of the approximation can be achieved. However, it also leads to the higher hardware complexity of the approximation circuit.
In this paper, in order to reduce the final approximation error with the small enough LUT size, we use an algorithm to find optimal values of ai and bi. We have to consider the approximation function complexity as well as the size of the correction LUT. Therefore, we proposed an improved 2-step optimization algorithm based on the one in [23] to achieve a better trade-off of the approximation circuit complexity to the correction LUT size. The proposed optimization algorithm aims to find the optimal values ai, bi for 4 linear segments and the LUT size can be reduced as much as possible by minimizing the maximum value of the absolute error function │E(x)│(MaxError). The optimization algorithm is performed by Matlab software.
In the proposed algorithm, firstly, the range of x is divided into 2 halves and the algorithm for each half is proceeded independently. The left half (0 ≤ x ≤ 0.5) is divided into two equal regions (0 ≤ x ≤ 0.25) and (0.25 ≤ x ≤ 0.5). Figure 4 describes the optimization algorithm for the left half in which 2 linear segments are chosen independently. In step 1, we choose the ranges of offset1 and offset2 in which offset1 and offset2 represent the values of approximation function when x = 0 and x = 0.5, respectively. The ranges of offset1 and offset2 are chosen to ensure the acceptable accuracy of approximation results. Then, a comprehensive search in the ranges of offset1 and offset2 is performed to find the optimal values of a1 and a2 that minimize the MaxError. After that, in step 2, a1 and a2 are re-assigned to the adjacent values which are the sum of power-of-two values to simplify the multiplications and one more search is performed to find the optimal values of b1 and b2 which minimize MaxError. For the right half (0.5 ≤ x < 1), the optimization algorithm is implemented similarly. Figure 5 depicts the 2-step algorithm for the right half range of x. Table 1 summarizes the results of optimization achieved by the proposed algorithm in each approximation step for log2(1+x). After step 2, MaxError increases a little but the LUT size is not changed compared with the results in step 1. Hence, the approximation function can be expressed as (8).
Step 2: Re-assign the optimal slope1 and slope2 values in step 1 to the adjacent power-of-2 values and find the optimal offset values.   Table 1. Optimization results of the improved 2-step optimization algorithm Step Step 1 Step  Table 2 shows the results of the error analysis with the proposed method compared with other 4-segment linear approximation methods. As mention previously, MaxError is the maximum value of the absolute error function │E(x)│. MaxError(+) and MaxError(-) represent the maximum positive value and the minimum negative values of the error function E(x), respectively. The mean error denotes the mean of the absolute error function │E(x)│. It can be seen that the proposed method achieves comparative results over other ones. Moreover, Figure 6 shows the approximation error results of the proposed method for two  Table 3. It can be observed from Table 3 that the errors of the case of using 128×5 bits LUT reduce significantly compared with the case without LUT.

Method
In [23] In [21] Proposed  The proposed hardware architecture of a logarithm generator for the 16-bit integer input N with the 4-bit integer part and 13-bit fraction part output is shown in Figure 7. The LODE (leading one detector and encoder) block generates n from the input N and n is encoded into the binary form. We use the INV (inverter) block and a modified barrel shifter to generate the fraction part x as shown in (1). Meanwhile, log2(1+x) is approximated by the 4-segment linear approach as described above. The two most significant bits of x are used as the selection bits to choose one of the four regions in the linear piecewise approximation. The shifters operate the right shift operations of x and 3 multiplexers are used to select the terms of slope ai. Coefficients bi is stored in the Coef. LUT. The Error LUT stores the residual error. We can increase the LUT size to achieve the better accuracy of the approximation. However, to archive a good tradeoff of the hardware complexity with the accuracy, a 128×5-bit LUT is used. Finally, an adder is used to add these 5 components to provide the fraction value (F) of the binary logarithm result. For the control purpose, a flag (z) is used to indicate the special case of zero input.
The proposed 16-bit logarithm generator was modeled in VHDL and implemented with Xilinx FPGA device (Spartan-3E). The area results of the FPGA implementation in the number of FPGA LUTs used is shown in Table 4. It can be seen that the proposed method results in the significant improvements both in area and computation delay.

CONCLUSION
This paper presented an improved approach of modified quasi-symmetrical method to implement the low-error, low-area hardware logarithm generator for digital signal processing systems which require high-speed, real time logarithm operations. The error analysis and FPGA hardware implementation results have clarified that the proposed logarithm generator can be applied for emerging DSP systems. Especially, the proposed approximation method can reduce the approximation error and hardware complexity compared with other methods. In the future work, we will apply the proposed method for the implementation of completed speech processing system for real time applications.