Design and implementation of dual-core MIPS processor for LU decomposition based on FPGA

Received Mar 26, 2020 Revised Jun 29, 2020 Accepted Jul 11, 2020 Many systems like the control systems and in communication systems, there is usually a demand for matrix inversion solution. This solution requires many operations, which makes it not possible or very hard to meet the needs for real-time constraints. Methods were exists to solve this kind of problems, one of these methods by using the LU decomposition of matrix which is a good alternative to matrix inversion. The LU matrices are two matrices, the L matrix, which is a lower triangular matrix, and the U matrix, which is an upper triangular matrix. In this paper, a design of dual-core processor is used as the hardware of the work and certain software was written to enable the two cores of the dual-core processor to work simultaneously in computing the value of the L matrix and U matrix. The result of this work are compared with other works that using single-core processor, and the results found that the time required in the cores of the dual-core is more less than using single-core. The designed dual-core processor is invoked using the VHDL language.


INTRODUCTION
Many different systems require solving of matrix inversion, these systems like control or communication systems. The required time for solving the matrix inversion increases on the size of the matrix is become bigger. Hence, an alternative method were required in order to work in real-time, one of these methods is the LU decomposition [1].
In LU decomposition method the coefficient matrix  [2,3]. Following are set of equations for a 4x4 matrix.
If one has a system of equations in the form of [ ][ ] = [ ], then the method of using the LU decomposition will make the solution easier by using the triangular matrices. After computing the LU matrices as shown in the next equations [4][5][6][7]: The objective of this paper is to program and build a 32-bit MIPS processor to perform the LU decomposition. Then designing and implementing a dual core MIPS processor, the results will be compared for the two designs system, each system been designed and implemented in VHDL [8][9][10].

MIPS PROCESSOR
It is a reduced instruction set computer (RISC) processor developed by MIPS technologies in the early 1980s which can fully implement instructions in single clock cycle. Therefore the slowest instructions can limit session time. In this paper a single core and dual core MIPS processors will be designed and implemented to perform mathematical requirements for the application of LU decomposition [8].

MIPS instruction set architecture (ISA)
32-bits MIPS Architecture been covered in this paper where transactions are either register or memory locations as shown in Table 1, Processor, to get to the word uses byte addressable [9,11,12].

Instruction formats
The MIPS has three different formats, which they are the R-type, I-type and J-type. Table 2 shows the different instructions formats for the MIPS processor [13][14][15][16].

Single-core MIPS processor design
The MIPS processor is 32-bits processor which has 32 different registers each with size of 32-bits [17][18][19][20][21][22][23]. The main part in the MIPS processor is the control unit (CU). This unit consists of some registers and the arithmetic logic unit (ALU). Certain instructions where required for calculating the LU decomposition were designed and implemented [24][25][26]. Table 3 shows these different instructions. The design instructions set of the processor is suitable to perform LUD as shown in Table 4. Figure 1 shows the internal architecture of the control unit and Figure 2 shows the schematic design circuits that required in implementing the LU decomposition for single-core processor.

Dual-core MIPS processor design
Dual-core consists of two cores and each one is responsible for specific function, both cores shared same data memory. Each core has their own instruction memory, register file and control unit, first core will be used to perform the lower (L) matrix while the second core will perform the upper (U) matrix depending on LU decomposition (factorization) [13,27]. Figure 3 shows the designed Dual-core MIPS processor, the Lower core is used to compute the (L) matrix while the Upper core is used to compute the (U) matrix, So that, both cores were working simultaneously to compute LU matrices in less time than single-core, which gives a high level of parallelism.

DATA REPRESENTATION
The fixed-point data representation is chosen in this paper, which is easier in the design consideration. Other method in data representation is floating which is excluded in this work because it requires a very large hardware component [28][29][30]. Figure 4 shows the format for 32-bits of data.

SIMULATION RESULT OF SINGLE-CORE
Single-core processor is implemented using FPGA development board Spartan-6 the simulation results which have been gotten from the Xilinx ISim simulator. Executing a set of instructions to compute LUD, both matrix and LUD is shown in (15) for a 4x4 matrix which also can lead into a 6x6 matrix, the time required to perform LU decomposition is 3070 ns (3.07 µs) at frequency 50 MHz. The results are found identical to the theoretical results when applied for the 4x4 matrix. Figure 5 and Figure 6 show the test-bench of waveform simulation for matrix A and it's LUD and Figure 7 shows the resources needed for the excuted design.

SIMULATION RESULT OF DUAL-CORE
The proposed design of dual core processor has been coded by using VHDL, XILINX Spartan 6 with sets of instructions that compute LU decomposition, a testbench was created to implement same 4x4  Figure 8, Figure 9 and Figure 10 with resource required as shown in Figure 11. The time required to perform L decomposition in dual core processor is 850 ns (0.85 µs) at frequency 50 MHz with number of instruction 41. As shown in Table 5, and the time required to perform U decomposition is 1170 ns (1.17 µs) for the same frequency with 57 instructions that has been used as shown in Figure 12.  Figure 8. Dual core processor test bench of register file 1 Figure 9. Dual core processor test bench of register file 2  Figure 10. Dual core processor test bench of data memory Figure 11. Summarize the FPGA resource of dual processor Figure 12. Simulation of LU decomposition using dual processor

CONCLUSION
A single core and dual core were designed to perform LU 4x4 matrix calculation for the purpose of teaching studies of the MIPS architecture course for master student. Designing and implementing single core and dual core processors with the required instructions for each processer sufficient to implement the LU decomposition using decomposition process. The time of single core processer to perform the LU 4x4 matrices was 3.07 µs at frequency 50 MHz while designing dual core processor where the first core of the processor used to compute the L matrix and the second core of the processor used to compute U matrix. This design can achieve high performance with timing of 1.17 µs. The most consuming processor is the Dual core processor. However, it gives higher performance.