Improvements in space radiation-tolerant FPGA implementation of land surface temperature-split window algorithm

The trend in satellite remote sensing assignments has continuously been concerning using hardware devices with more flexibility, smaller size, and higher computational power. Therefore, field programmable gate arrays (FPGA) technology is often used by the developers of the scientific community and equipment for carrying out different satellite remote sensing algorithms. This article explains hardware implementation of land surface temperature split window (LST-SW) algorithm based on the FPGA. To get a high-speed process and real-time application, VHSIC hardware description language (VHDL) was employed to design the LST-SW algorithm. The paper presents the benefits of the used Virtex-4QV of radiation tolerant series FPGA. The experimental results revealed that the suggested implementation of the algorithm using Virtex4QV achieved higher throughput of 435.392 Mbps, and faster processing time with value of 2.95 ms. Furthermore, a comparison between the proposed implementation and existing work demonstrated that the proposed implementation has better performance in terms of area utilization; 1.17% reduction in number of Slice used and 1.06% reduction in of LUTs. Moreover, the significant advantage of area utilization would be the none use of block RAMs comparing to existing work using three blocks RAMs. Finally, comparison results show improvements using the proposed implementation with rates of 2.28% higher frequency, 3.66 x higher throughput, and 1.19% faster processing time.


INTRODUCTION
Rapid prototyping of high-complexity digital circuits is now possible thanks to the density of current programmable circuits, such as field programmable gate arrays (FPGAs) [1]. It is possible to quickly test the validity of new architectural concepts: the complete implementation of a processor on FPGAs circuits is today within our range, resulting in more evaluation possibilities than those offered by software simulators. Moreover, the re-programmability of some FPGA chips has opened new research avenues: design methodologies for reconfigurable systems, able to evolve or adapt to varying environments or constraints. These developments are proposing novel opportunities, particularly in satellite remote sensing. Digital sensors mounted onboard remote sensing satellites examine massive regions of the Earth's surface day and achieved with higher frequency and a less resource was consumed. Moreover, the proposed implementation has the benefit to be used for satellites systems to be designed using similar thermal infrared channels. Section 2, of this research paper presents the purpose of using radiation-tolerant FPGA in satellite remote sensing applications. Section 3, describes an overview of the LST-SW algorithm. Section 4, is illustrating the proposed hardware FPGA architecture for the LST-SW algorithm. Finally, section 5 analyses the implementation results including the resources used, throughput and timing.

PURPOSE OF USING RADIATION-TOLERANT FPGA IN SATELLITE REMOTE SENSING
The use of reconfigurable hardware in space-based applications for remote sensing is increasingly of interest [19], [20]. Integrating FPGAs in a spacecraft enables application-specific hardware to be used among smaller size, lower cost, increased flexibility and higher computational capacity. As the downlink bandwidth does not keep pace, higher rises in sensor resolution in remote sensing space payloads produce a bottleneck in processing [21]. Operators need on-board processing to send the processed information to the satellites, not just raw data. This is a producing challenge for the approximately 100 remote sensing satellites launched each year. Nowadays, FPGA technology is being projected as a vital alternative: Prior to downlink, remote sensing data can be processed and interpreted in orbit-instead of storing and transmitting entirely the captured data/images-resulting in a significant bandwidth reduction. Subsequently, calculations to be carried out at ground stations come to be easier and faster [22].
Electronic devices external the Earth's atmosphere are usually subjected to a different radiation environment than the one on Earth. The activity of a traditional semiconductor system may be affected or interrupted by high levels of radiation. Electronic circuits can be built with specialized manufacturing techniques tolerating high radiation rates. With an increased focus on the exploitation of programmable logic in spatial uses, numerous researchers have examined the appropriateness of commercially available FPGAs in radiation environments [23]. The FPGAs based on the Xilinx SRAM and Microsemi's flash FPGAs are currently being used to address the problem by incorporating high-speed signal processing and built-in radiation reduction methods in order to keep the devices in difficult radiation environments operational. Furthermore, the FPGAs maintain low static power and reduce dynamic energy requirements significantly. Over 150,000 logical elements and a device efficiency of up to 300 MHz are available in this latest class of FPGAs for radiation-tolerant in a substantially higher proportion than other radiation-resistant FPGA technologies, including combinatory logic, DSP math blocks and transceivers.
SRAM-based FPGAs offer high performance, high logic density, and low non-recurring engineering (NRE) costs when compared to other FPGA technologies. At the same time, FPGAs can be statically reconfigured an almost infinite number of times after the initial power-on setup. In many applications, the advantages of SRAM-based FPGAs are considered to be dominant. In terms of flash technology, its total ionizing dose (TID) limitations and potential charge leakage is undergoing scrutiny by the industry. Furthermore, it does not currently support dynamic partial reconfiguration. Finally, the Xilinx Virtex-QV is found to have the greatest logic density, performance, and radiation-tolerance of all SRAM options, combining high-speed signal processing with special built-in radiation mitigation techniques to keep systems operational in harsh radiation environments. Radiation-hardened FPGAs are, in fact, in high demand for military and space applications [24]. Another reason is that Xilinx has introduced many radiation-tolerant FPGAs, including the space-grade Virtex-QV line of high-reliability FPGA chips with million gate densities to support remote sensing applications' high throughput requirements [25].
The space-grade Virtex-QV uses hardware and package hardening techniques, these particular reliability-enhancing techniques are: i) SRAM transistor cells for memory and latches configuration, ii) Triple modular redundancy (TMR) for configuration control, iii) Single event transient (SET) filter option for the configuration logic blocks (CLBs), and iv) epitaxial layer and protection layers. On the other hand, radiation hardness of the Virtex-QV comes at a higher energy consumption price [26].
In this work, we make use of the Xilinx space-grade Virtex-4QV XQR4VSX55 FPGA [27] for the proposed implementation of the LST-SW algorithm. The Virtex-4QV is the industry's high-performance radiation-hardened reconfigurable FPGA for processing-intensive space systems. The device offers one of the highest densities, performance and integration capabilities enabling more complex and capable systems over radiation-hardened ASIC devices with their high development costs and long lead-time. The Virtex-4QV is thoroughly tested for radiation-tolerance and is demonstrated to tolerate a total dose in the range of 300 krads, which is more than acceptable for many space applications.

LST-SW ALGORITHM
Land surface temperature (LST) occupy an important role in the land surface features at the local and global scale and is one of the most key variable in the biophysics of land surface procedures [28]. LST is determined from the use of either empirical or physical algorithms of satellite measurements in the thermal infrared (TIR) spectral bands. Numerous LST algorithms have been expounded and described over time in literature [29]- [34]. An amount of Earth observation satellites (e.g., NOAA/AVHRR, TERRA/MODIS, LANDSAT/TM and ENVISAT/AATSR) have TIR channels on thermal sensors to derive LST.
The estimation of LST from satellite information is mostly affected by the atmosphere and surface emissivity [35]. We apply the operational LST-SW algorithm proposed by Sobrino and Raissouni [12] to make accurate estimations of this parameter: where Ts is the LST , T4 is the brightness temperatures measured in AVHRR channels 4, and T5 is the brightness temperatures measured in AVHRR channels 5, ε=0.5(ε4+ε5) and ∆ε=(ε4-ε5) are, respectively, the average effective emissivity in channels (4,5) and the spectral variation of emissivity [36]. To calculate the total amount of atmospheric water vapor, W (g cm-2), we used the approach named the split-window covariance-variance ratio (SWCVR) [37]. This approach allows the estimation of W from only satellite data [38].

THE PROPOSED HARDWARE FPGA ARCHITECTURE FOR THE LST-SW ALGORITHM
The parallel processing of an FPGA is key research guidance of the fast calculation community with high efficiency. Different factors such as the logical resource level in the chosen FPGA, and the optimized nature of algorithms affect its calculation speed [39]. Figure 1 displays the corresponding block diagram of the proposed Xilinx Virtex-4QV FPGA LST-SW implementation: i) the input data are set as T4, T5, W and Epsilon corresponding to the T4, T5, W and ε satellite data, and ii) the output result of the LST-SW algorithm is set as LST. Thus, considering both the FPGA hardware architecture (i.e., four FIFOs instead of five FIFOs), and the study area characteristics (see § 5.1), we consider in our case Δε=0.005. Accordingly, the proposed architecture is involving two parts: a. LST1 (Part 1), as shown in (2), computing the main part of the LST algorithm (1) and b. LST2 (Part 2), as shown in (3), computing the correction part of the LST algorithm (1).
LST is then computed as the sum of LST1 and LST2, as shown in (4).   Figure 2 shows the corresponding general hardware architecture. Satellite data/raw images [i.e., T4 (in K, x10), T5 (in K, x10), W (in g cm-2, x1000), and ε (x1000), see § 5.1 for more details on data format] are saved on the hard disk and transferred to each considered part [i.e., LST1 (Part 1), and LST2 (Part 2)] via the corresponding FIFOs and variables. FIFO1 and FIFO2 transmit T4, T5 respectively to LST1 (Part 1), FIFO3 and FIFO4 transmit W and ε respectively to LST2 (Part2). Finally, LST1 and LST2 are re-transmitted via FIFO1 and FIFO2 respectively to be added. Figure 3 provides the equivalent structural description of the proposed FPGA LST-SW hardware in VHDL language. In fact, we only included the instantiations required for the main components and left out some components, signal definitions, and interconnection details.  Consequently, for computational reasons (i.e., to preserve both the integer format and size and the decimal precision for temperature data in Kelvin, T4 and T5), we proceed by: (i) dividing LST1 coefficients by 10 as shown in (5) and; (ii) dividing LST2 coefficients by 1000 as shown in (6). We can see from Figure 4 the architecture used in the proposed design to implement LST1 and LST2 modules. In the entire procedure, image data are saved in memory separately and released into the system pixel by pixel. The processing to compute the first pixel result of the LST is: FIFO1 and FIFO2 read the first pixel of T4 and T5 respectively, then transmitted to the first part to be calculated as show in (a) and is similar to the first part of the (2), and with the same time FIFO3 and FIFO4 read the first pixel of W and Epsilon respectively, then transmitted data to the second part to be calculated as show in (b) and is similar to the first part of the (3). Finally, as in Figure 2 we add the two part: LST1 and LST2. We replicate this operation in the whole procedure line by line until the pixels finish.
(a) (b) Figure 4. Hardware architecture used to implement the two module LST1 and LST2, (a) Hardware architecture adopted to implement LST1 module, (b) Hardware architecture adopted to implement LST2 module The fixed-point package was choosing in the arithmetic logic units to reduce area utilization and get a better precision. In fact, even though the type REAL specified in the package standard has limited synthesis support, newer options for dealing with fixed point was used in the proposed implementation and represented by using the library fixed_pkg in the VHDL code for the two modules LST1 and LST2. In arithmetic logic units we chose to use a fixed-point package for decimal numbers in the proposed algorithm ((18) and (19)) instead of floating point. In fact, fixed point arithmetic is valuable as it results in faster and smaller functional units. However, if not carefully constructed, it can produce less accurate results. On the other hand, floating point arithmetic is consuming in terms for hardware and leads to inefficient designs, especially for FPGA implementation.
The addition, subtraction, multiplication, and division operations are required by the FPGA LST-SW implementation algorithm as shown in Figure 4. The number of calculations increases as the resources increase. Several of these computations are straightforward, such as addition, subtraction and multiplication; however, a division operation is complex and can affect the precision significantly. Therefore, in the proposed implementation we handled this carefully to prevent overflow, which leads to incorrect results using the functions existed in the library fixed_pkg. For the division operation, instead of dividing by 10 . This has a positive effect, as the multiplication does not consume many resources and does not affect precision. Figure 5 shows the register transfer level of the FPGA LST-SW implementation algorithm and the connection between different blocks, with 16 bits range of data that sent it into the system as the output LST result. Furthermore, Figure 5 shows the input signals T4, T5, W and Epsilon. Additionally, the output of each part of LST1 and LST2 are connecting to FIFO1 and FIFO2 respectively. In addition, each component (LST1(part1), LST2(Part2) and FIFOs) of the proposed implementation. is controlled by clock, reset, read, and write.

Study area
The pathfinder AVHRR land (PAL) satellite dataset has been exploited for examine the Mediterranean basin. The calibration of the imageries is attained by the recommendations advocated by the NOAA [40], [41]. Table 1 illustrates the characteristics of the cloud-free images used in this application and corresponds to the period of July 1982.

FPGA LST-SW simulation results
In this part, we present an experimental result of the proposed FPGA LST-SW implementation algorithm's computational efficiency. For the specification of the LST module, the hardware architecture defined in Section 3 was implemented using the VHDL language. Furthermore, we specified the entire system using the Xilinx ISE 14.7 environment. Table 2 shows the resources consumed by the proposed LST-SW algorithm design, using the space-grade Virtex-4QV XQR4VSX55. This FPGA has a total of 24,576 slices, 49,152 slice flip-flops, and 49,152 four-input LUT available. Moreover, the FPGA includes some heterogeneous resources, such as DSP48s, and has a total of 512 of them. Moreover, the reason for working with this type of device (XQR4VSX55) in the algorithm LST is because the latter contains too many arithmetic operations. For this reason, we need several DSP plus this type, especially for the optimization of the ultra-efficient signal processing. We used these resources to optimize the design in the proposed implementation. The block slice registers were employed to implement the FIFOs without using block RAMs. With other slices are used for the implementation of the LST-SW algorithm. Moreover, the number of DSP is higher because of the existence of the calculation in each part (LST1 and LST2), which consumes much of the resources of the FPGA. The time used in the implementation allowed us to achieve a maximum frequency of 190.484 MHz. Figure 6 displays the LST-SW images computed in the period July using the proposed implementation. In complete coherence with the biodiversity of the same, a large LST variability is seen for the area and period.

Comparison of performance between the proposed LST-SW existing implementation
We evaluate the performance of the proposed implementation by performing a comparison with LST-SW implementation that has already been implemented in [18] as describing in Table 3. The metric for evaluating the performance of the proposed LST implementation comparing to the existing work was based on the resources used, the frequency, the throughput, Throughput per slice (TPS), and the processing time. Other features such as power consumption are not describing in this implementation. Table 3 presents the comparison of performance between the proposed architecture and the existing architecture [18]. As we can see, the proposed implementation in terms of hardware consuming is much better than the existing implementation. In fact, the proposed design uses 1124 slices from a total of 24,576 slices and 2438 slice LUTs from a total of 49,152 compared with 1169 slices from a total of 7200 slices and 2590 slice LUTs from a total of 28,800 in [18]. Moreover, the way in which FIFOs were implemented in the proposed implementation and in [18] is different. The FIFOs used in the proposed architecture were implemented with the slices, but in [18] they implemented FIFOs based on the Blocks RAMs, so there is a benefit of the proposed architecture here in terms of resource optimization. In addition, we notice from Table 3 that the frequency of the proposed architecture is higher than the frequency obtained in [18], which leads to faster response times and high-speed calculation of the LST algorithm. Another important metric is the throughput, and signifies the number of bits processed per unit time and is specified in Gbps or Mbps. The throughput is calculated using (7).

Throughput=
Number of bits processed × F max Latency (7) In the proposed architecture, the number of bits processed is 16, Fmax is the tool's maximum frequency, and Latency is the number of clock cycles after which output is generated, which is equal to seven clock cycles. Therefore, the proposed design achieves the highest throughput of 435.392 Mbps on Virtex-4QV with a maximum clock frequency of 190.484 MHz compared with reference [18]. In order to measure the hardware resource cost associated with implementations resultant throughput, TPS metric was applied by using (8): The CLB is an acronym for Configurable Logic Block. As shown in Table 3, the number of slices used is 1124, so the TPS of the proposed architecture is about 0.387 Mbps/Slices, which is higher than the current implementation in [18]. Finally, Table 4 shows the processing times obtained using the considered  As can be observed from Table 4, the processing time in the proposed architecture is faster compared to the software version and from [18]. Another important aspect in the hardware implementation is the issue of arithmetical precision. As we said in Section 3, we have paid special attention to this problem in our design. The fixed-point structure helps us reduce processing time and use less logical resources in the proposed implementation of the LST-SW algorithm using the fixed_pkg library in VHDL. As a result, the proposed implementation is a major step forward in ensuring that the LST-SW algorithm is properly used in scenarios requiring real-time processing.

CONCLUSION
From the perspective of remote sensing systems, the reconfigurability of FPGA systems opens up a lot of innovative possibilities. This ranges from the appealing possibility of choosing the data processing algorithm to be used aboard, out of a set of algorithms that are available, from the Earth control station immediately following the data collection from the sensor to the potential one. Radiation-hardened FPGAs can be easily mounted or embedded in the sensor due to their compact size and low weight, will greatly benefit current sensor design practices. The role of FPGAs in remote sensing missions was discussed in this paper, as well as the benefits of using FPGAs, such as being radiation-hardened, the experimental results of the FPGA implementation of the LST-SW algorithm, and the performance of the proposed architecture in comparison to previous work.The results showed that the suggested implementation of the LST-SW using In addition, the proposed implementation shows better performance in terms of area utilization compared to the existing work: 1.17% reduction in number of Slice used, a 1.06% reduction in number of Slice LUTs, and total elimination of block RAMs. Furthermore, the proposed implementation has 2.28% higher frequency and achieved higher throughput 3.66 × compared to the existing work, and faster processing times with decrease of 1.19 %, which can allow us to implement other algorithms with the same device. Furthermore, this implementation offers significant and promising performance, making it suitable for future CubeSats on board LST-SW computation. In addition, since the FPGA we used is especially radiation-hardened, the proposed implementation can be used, proposed, and even programmable for all CubeSats.