Cache coherency controller for MESI protocol based on FPGA

Received Apr 15, 2020 Revised Sep 3, 2020 Accepted Sep 15, 2020 In modern techniques of building processors, manufactures using more than one processor in the integrated circuit (chip) and each processor called a core. The new chips of processors called a multi-core processor. This new design makes the processors to work simultanously for more than one job or all the cores working in parallel for the same job. All cores are similar in their design, and each core has its own cache memory, while all cores shares the same main memory. So if one core requestes a block of data from main memory to its cache, there should be a protocol to declare the situation of this block in the main memory and other cores.This is called the cache coherency or cache consistency of multi-core. In this paper a special circuit is designed using very high speed integrated circuit hardware description language (VHDL) coding and implemented using ISE Xilinx software. The protocol used in this design is the modified, exclusive, shared and invalid (MESI) protocol. Test results were taken by using test bench, and showed all the states of the protocol are working correctly.


INTRODUCTION
Processors today are manufactured as multi-core processors; all cores are similar in their design and each has its cache memory and all of these cores are in one chip. All these cores shared one main memory located outside the chip [1][2][3][4]. In these kinds of systems, the cache plays very important part in that kind of processor design. The processor asks for data firstly from the cache and if the data is not present in the processor cache the processor will fetch that data from the main memory and put it in the cache. These data may be exisit in anothers core and that core works on that data and changes its value, hence the data exisit in the main memory are invalid and the data must be updated. Hence a protocol of cache coherency should be applied for the caches of the cores [5][6][7][8].
So that a snooping protocol should be applied to ensure that no core will use invalid data [9]. Different protocols were used for different system. One of these protocols is the MESI protocol which firstly used in the Pentium processor [10][11][12]. In modified, exclusive, shared and invalid (MESI) protocol the data represented by a block in the cache will be in state Modified or Exclusive or Shared or Invalid. A cache controller will makes snooping for all the caches of each core and updates its state for example from Invalid to shared or from shared to exclusive and so on [13,14].
Hence the problem in this kind of system is to design a cache coherency circuit to snoope the system and apply the used protocol which is in this design is the MESI protocol [15]. The design implemented using very high speed integrated circuit hardware description language (VHDL) then integrated with field  [15]. The results for the different parts of the processor are presented in the form of test bench waveform and the architecture of the system is demonstrated and the result was matched with theoretical result.

RESEARCH METHOD 2.1. The cache
The cache is a static memory while the memory which is used in the main memory is of type dynamic. The cache is faster than the RAM in main memory by a factor of (8-10) times, but the size of the cache is greater than RAM. To store one bit of data in cache it requires 6 transistors while in dynamic RAM it requires only one transistor. Because of that the designers put a small size from the faster memory (cache) inside the processor and a large size of danamic RAM outside the chip of the processor in the mother board. The processor always will ask for data from the cache and if it is exisits in the cache the processor will fetch the data in one bus cycle (2 clks) and this case is called read hit. If the data not exisit in the cache the processor must fetch it from main memory with extra clock cycles and this case is called read miss [16][17][18][19][20].

Architecture
There are three different types of cache organization. In this paper the direct mapped organization is used because it is easy and simple in design [21,22]. Figure 1 shows a simple direct mapped cache organization desighed for the purpose of testing the cache coherence protocol. As shown in Figure 1, the main memory address is divided in to three parts, which they are the offset, index and the tag. In the designed cache the addres is partitioned to 4-bits for the offset as bits (0-1) for byte select and bits (2-3) for word select and bit (4)(5) are used for index. The rest 26-bits are used for the tag [23][24][25]. Figure 2 shows the design of the used direct-mapped cache.

MESI PROTOCOL
This protocol consists of four different states, which they are invalid (I), shared (S), exclusive (E), and modified (M). This protocol is an advance to the previous MSI protocol. The new state exclusive (E) is added in order to reduce the number of bus messages. The Exclusive state means that the block or time of data a valid in the cache and main memory and not valid in other caches ,which give a flexibility to the processor to modify its cache without a need to snoop other caches [26][27][28][29]. Figure 3 shows a state transition diagram for the MESI protocol. In the left side of the figure represents the processor requests and the action of the cache controller circuit, while the right part of the figure represents the bus requests and corresponding actions [30,31].  Figure 4 shows simplified MESI state diagram with transition states [32]. In this section, it is explained the results of research and at the same time is given the comprehensive discussion. Results can be presented in figures, graphs, tables and others that make the reader understand easily [2,5]. The discussion can be made in several sub-chapters.

VHDL top_level implementation
A VHDL components of MIPS processor which was designed by [33] is combined with the VHDL components of this design, by using (Xilinx ISE Design Suite 14.1) all these components are connected together in order to compose the top level, later a test bench is written and used to enter the 2-bits of cache size controller and execute a written test program. Figure 5 shows Top_level and Figure 6 Shows the schematic view of top_level components for the designed system. It consists of three parts: pipelined MIPS, data memory system and instruction memory system.

Design cache coherency
In this work two microprocessors MIPS1 and MPIS2 were designed and each with separate cache. A cache coherency controller is designed which consist of two parts, the coherency tag and coherence controller by using FSM. All these components are connected together on chips and have the main memory of chip. Figure 7 shows the design for the cache coherency protocol. Tag cache: Data tag cache has 28-bits (26 tag bits, 2-bits for MESI protocol) for each data cache line. MESI bits are reset when the machine restart. Instruction tag cache contains 27-bits, it is similar to instruction tag cache in single core [34]. Figure 8 shows the RTL Coherence tag and coherence controller.
In this paper a 7-bits where used to indicate different states for MESI protocol, two bits for M(Modify), two bits for E (Exclusive), one bit for S(Shared) and two bits for I (invalid). Table 1 shows MESI states. Figure 9 shows the coherency tag with 7-bits for MESI states. Table 2 shows the different sates of MESI Protocol.

RTL schematic of multi-core MIPS processor design
Two single core MIPS processors are combined together to generate a multicore MIPS processor. Each single core is a pipelined MIPS processor and has its own L1 cache memory. Both cores shared the main memory. Multicore processor exploits the parallel available in program to allow its cores to work together for the same job, therefor parallel program is needed to reach better performance from multicore processor. Since each core has L1 cache, then the same memory address may be found in both cores. This may cause consistency problems in data cache. Instruction cache does not have this problem because the processor cannot modify program instructions. In this paper snooping-based coherency is used as a cache coherency mechanism, and the coherency protocol used is MESI protocol. Figure 10 shows the RTL cache coherency controller for MESI protocol. Figure 10. RTL cache coherency controller for MESI protocol

RESULTS AND DISCUSSIONS
To verify the validity of the design that was built in this paper, multiple programs were written for the purpose of examining the work of the MESI protocol by the coherency controller. The results were found to be identical to the different 15 states that designed in the protocol, Figure 11 shows the test bench results for some of these states.

CONCLUSION
In this paper, a pre-designed MIPS type processor was used in the department; this processor is used to build another processor corresponding to it so that to have two cores and a cache was built for each processor and the type of mapping were used direct mapped type for ease of design. Then building a circuit of coherence controller in order to control the reading and writing processes for processors when reading and writing from the shared memory of each of the processors. A link to all the designed parts, and several programs were written for the purpose of operating and examining the MESI protocol, and the results were found to be compatible with the topic design and the purpose of this research for use in the scientific purposes of master's students in advanced computer technique Lab.