FPGA Realizations of Walsh Transforms for Different Transform and Word Lengths into Xilinx and Altera Chips

ABSTRACT


INTRODUCTION
Discrete Fourier Transforms (DFT) technique for analyzing periodic digital signals already exist. However, the method is very complicated resulting in many problems during hardware implementation, and its use is to justify only with the complex systems. Walsh transforms (WT) based on Walsh functions may also be utilized to analyze the signal in the frequency domain for a particular case. It has been shown that, basically, a periodic digital signal also may be represented as a series of Walsh functions. An attempt has been made to use the concept to form a spectrum of digital signals.
Fino et al. initially proposed how to realize Walsh transforms based on addition and subtraction technique [1]. This idea attracts many scientists for developing how the Walsh transforms can be implemented in hardware. However, the method has a disadvantage such as it requires addition and subtraction of samples in word level. Later, a method of bit-level systolic arrays is developed to increase the speed of Walsh transforms [2]. Later then, Nayak et al. proposed a fully pipelined two-dimensional (2D) bitlevel systolic architecture for achieving a more efficient realization [3]. Amira et al. proposed the new way of Walsh transforms realization based on Hadamard matrices that are called Fast Hadamard Transforms (FHT) [4]. A more intense works have been carried out during last two decades. For instance, the method of how Walsh functions are generated in four different orderings has also been introduced [5]. Later, Chandrasekaran et al. proposed the power analysis of Walsh transforms [6]. Then, a technique of the efficient architecture type of Walsh transforms was also developed in 2008 by Meher et al. [7]. Besides lots of other designs that have been published later.
The application technique of Walsh transforms for addition and multiplication of two digital signals was proposed earlier [8], [9]. More intensive research also has been published after that. The majority of the scientists and researchers are focussing their works on developing Walsh transforms only. However, even less, the technique for inverse Walsh transforms also have been elaborated [10]. The hardware implementation has also conducted recently for proofing the addition concept using Walsh transforms, and inverse Walsh transforms [11]. The primitive Spartan 3 has been used in the implementation, and the results were captured using a logic analyzer at 20 MHz.
Alternatively, scientists also developed algorithms of Fourier transforms by combining it with the Walsh transforms [12]- [14]. This concept is based on the simple calculation of Walsh transforms that seem to be ignored in the previous works. This algorithm such as Walsh transforms was adopted through a factorization of the intermediate transforms T for the coefficients calculation of DFT [12]. Monir et al. also then proposed the effective combination of the DFT and Walsh computations. The technique is used to perform what it called Fast Walsh Hadamard Transforms (FWHT). It was achieved by utilizing Radix-4 method [13]. Next, an efficient computing algorithm of both the Walsh transforms and the DFT transforms using the well-known Radix-2 also proposed [14].
The analysis and synthesis of periodic digital signals, after obtaining a spectrum has therefore been demonstrated. Multiple signals are also conveniently generated. Further, manipulations and processing of multiple signals from their digital spectrum have been shown [15], [16]. Therefore, there is a need to explore more on Walsh transforms realization. This paper presents several previous works of Walsh transforms realizations and some new results for a complete and comprehensive design. The realization of Walsh transforms targeted to state-of-the-art FPGAs from Xilinx and Altera. A comparative design of FPGA realization to Xilinx and Altera has been presented. The design is undergoing by exploring the properties of Walsh transforms base on products of Rademacher functions. This paper presents the complete realization of Walsh transforms for arbitrary waveform generation (AWG), signal addition/ subtraction, multiplication of two signals and processing more than two signals. In the next section, some fundamental theories of Walsh transforms, and Walsh functions are presented. In section 3, a short and precise design of how Walsh transforms is used for realizations. The implementation of the design into FPGA is covered in section 4. Some discussions and comparisons of various results regarding the speed and static power dissipation are described here. Finally, some conclusions regarding the results are mentioned at the end of this paper.

DESIGN OF WALSH TRANSFORMS FOR FPGA REALIZATION
As has been described in the introduction, the Walsh transforms may be realized directly and it may also be implemented in terms product of Rademacher functions. The design of Walsh transforms application here is based on the second method since it more conveniently for hardware.

Design of WT and IWT
Walsh transforms conceived in terms product of Rademacher functions. Figure 1(a) shows the previously proposed of WT for transform lengths N [10]. Input data X is passed to the circuit serially, and they are controlled by Enter signal. Meanwhile, the outputs transformed coefficients Y are produced in parallel. Walsh circuit works based upon the product of Rademacher functions are used to control data buffers and accumulators. Figure 1(b) shows the proposed of Inverse Walsh transforms (IWT) for transform lengths N [10]. N inputs (coefficients) C are passed into the circuit in parallel controlled Enter. Meanwhile, the outputs of H are produced in serial. Every time Enter goes high, Cn or -Cn (negative value of Cn) will be passed to data buffers through multiplexers. At the same time, the data inside data buffers are passed to the output buffer. The multiplexers select Cn or -Cn based on output signals of the Walsh circuit. Walsh circuit control data buffers and accumulators.

Walsh Transforms Applications
Walsh transforms can be applied for AWG, addition/ subtraction, processing of several signals, and multiplication system. The AWG system realized by combining WT and IWT, output results of WT becomes input for IWT. Therefore, the system can generate signal continuously [17]. The addition or subtraction system convert both input signals into the frequency domain using WT. These values are called coefficients of input signals. Then, both of the transformed signals (coefficients) are added or subtracted from each other. The result (another coefficient) is then converted back again to the time domain and consider as the output of addition or subtraction process using IWT [10].  Similarly, the application for multiplication of two signals is also performed by transforming the signals to the frequency domain (called coefficients) and by transforming back to time domain after processing. Coefficients of the first signal are multiplied by the coefficients of the second signal, resulted in another coefficient. The last coefficients are then transforming back to the time domain and consider as output.

Negative Circuits
Walsh Circuits

Word Lengths Design
To reduce the circuit usage, it is required particular attention for choosing the suitable word lengths. Word lengths of the input signal are noted WI, word lengths for representing the output of Walsh transforms is denoted WO as can be calculated based on (1). Since inputs of IWT in the realizations is the output of WT, therefore the word lengths of input IWT is noted WIC. The word lengths of output inverse Walsh transforms labeled WOC on (2). This word length is to differentiate with word lengths of the input signal because, in some applications, word lengths of input and output signal are equal [10].
In the AWG design, the word lengths will be equal to WT and IWT since AWG is the combination of them. Word lengths of transformed signal WO=WIC, because the processed signal will be retransformed again. The word lengths of output AWG is equal to the input, so it is labeled WI. In other applications, such as addition, subtraction and multiplication, all word lengths are labeled same as the AWG application. The word length of addition or subtraction results WOO is as formulated in (3). The word lengths of multiplication result are according to (4), and its coefficient based on calculation as shown in (5).  Table 1 summarizes all word lengths required for design systems of transform lengths N and input word lengths WI or WIC (only for IWT). Those word lengths have been calculated in detail for minimizing circuit usage. A detail calculation of those formulas have been discussed explicitly, and the optimize word lengths are obtained by analyzing the behavior of word lengths characteristic using MATLAB [10].

FPGA REALIZATIONS
The realization will be performed and displayed for Walsh transforms, inverse Walsh transforms, arbitrary waveform generation, signal addition, signal subtraction, signal multiplication and processing several signals. The FPGA implementations are targeted to Xilinx and Altera chips. Xilinx ISE is used to simulate either behavior or timing, synthesize and estimate static power consumptions of Xilinx chips. Meanwhile, Quartus is used to simulate the design with the help of Modelsim for implementation into Altera chips.

Walsh Transforms
The Walsh transforms designed in section 2 has been implemented on Xilinx and Altera chips for transform lengths N=4, N=16 and input word lengths WI=4, WI=8. The input signal is passing through the system serially, but the results are arranged in parallel.  Figure 3 displayed Xilinx and Altera close simulation results. Figure 3(a) views a close examination when the third input value already in the system. There is a delay (call clock to pad delay in Xilinx) about 6.4 ns from rising edge of signal Enter to output changes. Figure 3

Inverse Walsh Transforms
Inverse Walsh transforms works opposite of the Walsh transforms. Therefore, in this realization, it will trade the output of WT as an input for IWT. Inverse Walsh transforms as designed in section 2 has been implemented on Xilinx and Altera chips for transform lengths N=4 and input word lengths WIC=6. The input signal is passing through the system in parallel, but the results are arranged in serial.

Arbitrary Waveform Generation
Arbitrary Waveform Generation is designed by combining Walsh transforms, and inverse Walsh transforms [17]. AWG has been implemented on Xilinx and Altera chips for transform lengths N=16 and input word lengths WI=8. The input signal is passing into the system serially; the results are also in serial. Both of them are formatted to 8-bit sign number. Figure 7

Addition System
The design of addition system has been implemented on Xilinx and Altera chips for transform lengths N=16 and input word lengths WI=8. Signal inputs x[8:1] and g [8:1] are passing into the system serially, the result signal h[9:1] and its coefficients are also in serial. Both of the input signals are formatted to 8-bit sign number, and the output addition result is formatted in 9-bit sign number based on Equation (3), and the coefficients of the output signal are formatted in 12-bit sign number based on calculation according to Equation (1). Figure 8 shows

Subtraction System
The design of subtraction system has been implemented on Xilinx and Altera chips for transform lengths N=16 and input word lengths WI=8. Input signals x and g are passing into the system serially, the result signal h and its coefficients are also passing in serial. Both of the input signals are formatted in 8-bit sign number, the output result of subtraction h is formatted in 9-bit sign number based on the calculation of

4989
(3), and the coefficients of the output signal are formatted in 12-bit sign number based on the calculation of (1). Figure 9 shows Altera timing simulation of subtraction system for N=16 and WI=8. Result signal of subtraction h [8:1] and coefficients of signal x [8:1] are shown in Figure 9(a). Figure 9(b) shows output and the coefficients of signal g [8:1]. The coefficients of the output signal are shown in the Figure (9c). The result signal h is calculated by subtracting signal x with signal g. Detail values of input and output signals are listed below.

Multiplication System
The design multiplication system has been implemented on Xilinx and Altera chips for transform lengths N=16 and input word lengths WI=8. Input signals x and g are passing into the system serially, the result signal h and its coefficients are also passing in serial. Both of the input signals are formatted in 8-bit sign number, the output result of multiplication system is formatted in 15-bit sign number according to (4), and the coefficients of the output signal are formatted in 23-bit sign number based on the calculation of (5). Figure 10 shows Altera timing simulation of multiplication system for N=16 and WI=8. Result signal of multiplication h and coefficients of signal x are shown in Figure 10(a). Figure 10(b) shows output and coefficients of signal g. The coefficients of output signal h are shown in Figure 10(c). Detail of inputs, output and coefficient value are tabulated in Table 2.

Processing Several Signals
Realization of WT is also implemented for a system to process several signals. In this case, a system of h=x+g-j has been realized. Where h refers to the output signal and the rest refer to input signals. This process has been implemented into Xilinx and Altera chips for transform lengths N=4 and input word lengths

4991
WI=4. Input signals are passing into the system serially, and the results are also passing out in serial. The word lengths of output signals will be WOO=8 bit. The system requires 2 bit more and 2 bit for processing three signals. This number is the maximum value to be preserved. However, based on the discussions in section 2 and analyzing of word lengths behavior, WOO=6 will be enough. Figure 11 shows a realization of processing of h=x+g-j. All input signals are formatted in 4-bit sign number, the output signal h have to be at least in 6-bit format. Signal x[4:1]={-6,-2,3,7}, signal g[4:1]={6,6,5,-5}, and signal j[4:1]={-5,5,-7,1} are passing into the system in serial based upon the rise edge of Enter. The output signal h[6:1]={5,-1,15,1} will be available when Pass in high state. All of the coefficients are passing out at coeffs [6:1]. The first four numbers represent coefficients of signal x [4:1]; the second four numbers are coefficients of signal g [4:1]; and the last four numbers represent the coefficients of signal j[4:1]. Figure 11. Xilinx behavior simulation of processing several signals x+g-j for N=4 and WI=4

Speed Comparisons
The realizations of Walsh transforms for the designed systems has been demonstrated in the previous sections into various Xilinx and Altera chips. Xilinx ISE and Quartus are the primary tools for those simulations, besides other software such as Modelsim for displaying the simulation results. To estimate the speed, the design has been synthesized for finding timing summary. For instance, the list of timing summary below is performed under Xilinx ISE using the fastest chip (speed grade: 5) of Spartan 3. It can be seen that minimum period of Clock is 27.14 ns or maximum frequency will be 36.864 MHz, with minimum input arrival time is 7.917 ns and maximum output required time after is 6.216 ns. However, the clock period of Enter is 4.815 ns, or it might reach 207.693 MHz. Most of the delay is because of routing which is about 2/3 of the total delay. To make a fair comparison of the realizations, the designs have been implemented into the Virtex-4 chip using Xilinx ISE and Stratic IV using Quartus. Table 3 Table 4 shows a list of speeds of Stratic IV realizations. They are almost similar to the Xilinx implementations. The fastest system when the realization performed for transform lengths N=16 is Walsh transforms following by inverse Walsh transforms which is about 293 MHz and 170 MHz, respectively. The slowest process is for the system of multiplication of two signals which is only 38 MHz. The comparison of this to other designed system has been made previously for Walsh transforms [10] and AWG system [17].

Static Power Comparisons
The realizations also have been performed to estimate the static power consumption. The design system has been synthesized (Xilinx ISE) and power analyzed (Quartus) for finding the estimation of static power consumption. For instance, the list of power summary below is performed using Xilinx ISE of the fastest chip (speed grade: 5) of Spartan 3. It can be seen that power consumption is 37 mW. This power estimation is performed using Vccint 1.2 V, Vccaux 2.5 V, Quiescent Vccint 1.2 V and Quiescent Vccaux 2.5 V under 10 mA current. The estimate is assumed to be performed under junction temperature 26 o C, ambient temperature 25 o C, case temperature 26 o C and theta J-A range between 31 -32 o C/W.
Quartus power analyzer has been used to analyze the static power consumptions of Altera chips. Table 5 views list of power dissipations of Spartan 3 and Cyclone IV GX chips when they are implemented using various transform lengths and word lengths. There is no significant difference in the power dissipation of various Walsh transforms in both chips. Power dissipation of Spartan 3 is 56 mW for WT (N=16, WI=8) and IWT (N=16, WIC=12) realizations; the rest systems are equal which is 37 mW. Meanwhile, the achievements in the Altera Cyclone chip require power from 80.9 mW up to12.80 mW. Again, WT and IWT systems of N=16 realizations require consuming more power than other systems. Generally, unlike speed, Cyclone IV GX consumes power twice higher than Spartan 3. However, this is not an apple to apple comparison since both chips work on a different platform system.

CONCLUSION
Realizations of Walsh transforms for demonstrating AWG, addition/ subtraction, multiplication, and processing several signals systems into various FPGA chips has been done successfully. Walsh transforms realized in term product of Rademacher functions. The realizations are performed using transform lengths N=4 and N=16; higher transform lengths will be conveniently conducted later. The real system nowadays is performed using word lengths of 32 bit or 64 bit. However, in this paper, the word lengths are chosen smaller for simplicity of simulations. Walsh transforms can be realized not only by the application that has been done here, but it is potential can be used for other applications.