Super-linear speedup for real-time condition monitoring using image processing and drones

ABSTRACT


INTRODUCTION
Many real-time applications including video processing need an algorithm to be executed in parallel on multicore or a multiprocessor system.Multicore or multiprocessor with parallel programming is used to address performance improvement.To achieve such improvements, efficient utilization of thread-level parallelism is elemental.In fact, the ability to divide the tasks among a multicore or multiprocessor system is sub-linear, linear, or superliner speedups.A multicore system adds processing power with minimal latency which delivers significant performance benefits for software.This trend is shaping the future of software development toward parallel programming [1].This benefit will be clear in applications which have huge input data and work in real time.Parallelism can be used at the system level by spreading the workload of the handling requests among the processors and disks.Data level parallelism (DLP) is enabled data parallel reads and writes via distributing data across many disks.Taking advantage of instruction level parallelism (ILP) via an individual processor is also critical to achieving high performance, and pipelining is the simplest way to do this.Parallelism can also be employed at the level of detailed digital design; for example, modern all-Int J Elec & Comp Eng ISSN: 2088-8708  Super-linear speedup for real-time condition monitoring using image … (Moath Alsafasfeh) 1549 optical arithmetic logic unit (ALU) use carry-lookahead, or set-associative caches [2].The principle of locality is one of the most important program properties.Programs tend to reuse instructions and data they have used recently; a program spends 90% of its execution time in only 10% of the code.The idea of locality is that the prediction of instructions and data that a program will use in near future is based on its accesses in the recent past.The locality has two types; spatial locality says that items whose addresses are near one another tend to be referenced close together in time.Temporal locality says that recently accessed items are likely to be accessed in the near future [2].Talk about speedup related to parallel processing, the speedup is estimated in comparison of the runtime of the best sequential program versus the run time of the parallel program [3] as defined in (1).
However, a speedup metric is defined by Amdahl's law in (2) which indicates that it depends on two factors; the fraction of the computation time that can be converted to take advantage of the enhancement ( ℎ ).The second factor is the improvement gained by the enhanced execution mode ( ℎ ).This is equal to the time of the original mode over the time of the enhanced mode.
In Amdahl's law, the task speedup cannot be more than the reciprocal of 1 minus the fraction if an enhancement is only usable for a fraction of a task.Amdahl's law can be considered as a guide to how much enhancement can be achieved.The goal is to utilize resources proportionally to where time is needed.The speedup that will be achieved by n cores is based on the proportion of the program/tasks executed in parallel versus in serial.The speedup of parallelizing any computing problem is limited by the percentage of the serial portion, which is also in agreement with Amdahl's law.Gustafson's trend is based on that once the problem size is increased; the processor power also tends to increase.Also, the drastic increase in the ratio of parallel-to-serial tasks in the computational load presents an equally dramatic increase in the processing requirements, which means once the computing resources increase, the problem size also increases, and thus the serial portion becomes much smaller [4].Gustafson modified Amdahl's law putting forth that while the size of the overall problem should increase proportionally to the number of processors (n), the size of the serial portion (s) of the problem should remain constant as the number of cores increases, as given by (3).
Superlinear speedup is defined as computation using n processors that could be more than the same computation performed on a uniprocessor [5].The speedup will be more than (n).There are many factors leading to this superlinear phenomena these include the increase in cache size where each processor has a local cache level 1 or level 2; hidden latency in communications; the different speeds of memory inherent in distributed memory ensembles, the shifting in time fraction spent on different-speed tasks [6]; the utilization of resources more efficiently that comes hand-in-hand with parallelization [7], and fitting the data in caches of multiple data nodes by partitioning the data.
In this work, we are using infrared as well as charge-coupling devices (CCD) videos for defect inspection in a solar system.Infrared images have been used for a wide range of applications including medical imaging, nondestructive testing, and quality controls.Other applications include helping firefighters and police to find warm bodies in search areas.With the development of image acquisition technology, the image is of higher quality such as image resolution.However, this leads also to increase in demands on memory and time.The high-resolution images extracted from videos at 60 frames per second, required a multicore system in order to process them for real-time systems [8].We combined two videos and using several image processing algorithms to inspect solar panels in real time.
Regarding to to the literature review, the existing techniques of using multiprocessing for image processing are presented.Mostly, images will require some pre-processing for noise removal or extraction of certain features and/or segmenting the image that even leads to more tasks to be accomplished.For example, the image segmentation process is one of the primary steps of extracting different objects or regions.The larger the images, the higher the computational time for the segmentation process [9].Happ et al. [9] enhanced the segmentation process of an image using a multicore processor and their results show a speedup.
The segmentation algorithm by Baatz [10] was improved by [9] used parallel processing, where the image is divided into tiles (regions).Using the sequential algorithm, one thread is utilized to process a local region growing for each one tile [9].Once the image is divided into tiles and then the work divided into threads, these should impact the final segmentation results.The number of threads should always be equal to the number of available cores.Three different sizes for the input images, 2800x2800, 2000x2000, and 1000x1000, the testing environment was on an Intel core 2 quad with speed 2.40 GHz, and 2 GB of RAM.The results show speed ups to around 1.5 times and 2.5 times using 2 threads and 4 threads respectively [9].Saxena et al. [11] represented the sequential image processing algorithms using multicore processor by the parallel implementation, such as segmentation, histogram equalization, and noise reduction.The input images are dividing into different tiles equal to the number of threads cores or the number of cores.Each core or thread processed its tile and paid attention to the synchronization within the processor.The input image resolutions are 256x256, 256x768, and 128x843.The testing environment was intel core i3-2350 M Processor 2.30 GHz, 3 GB of RAM, and hard disk drive 320 GB Software with a 64-bit operating system.They used also matrix laboratory (MATLAB) R2011a and JAVA JDK 1.6.0_21and.The results show that the parallel processing is better than sequential processing by 1 time.The results also show that for some algorithms the improvement reached 2 times [11].
Liu and Gao used a parallel programming tool for the implementation of the interpolation of the cubic convolution algorithm in images, for example OpenMP and threading building blocks (TBB) utilizing a multicore processor [12].They also compared between the sequential and parallel implementations.The results show that the cubic algorithm is improved 200% and 400% using of Dual-core and Quad-core respectively compared with sequential implementation [12].
Kamalakannan et al. [13] proposed multithreaded color image processing using fuzzy method versus edge detection including contrast enhancement.They proposed simultaneous processing for equal blocks using separate cores where the entire image has been partitioned into blocks [13].Their work tested using input images were 10 images of different pixel size using Core i5 Quad-core.The results show that using a four-thread model improved the performance 3.4 times over a sequential method.

RESEARCH METHOD
In this proposed system, we use the acquired videos from both the thermal and CCD cameras.In python and using OpenCV, we determine the length and the number of frames of the input video in offline processing.Figure 1 shows the main steps for video segmentation process in order to process each segment in by individual processes simultaneously.Ffmpeg is used for video portioning process using the following command which it is embedded in python code.
The segmentation process for thermal and CCD videos is started simultaneously in a while loop, by which the starting time, initialized at zero, is determined, then it increased by the cutting period as shown in (5), the cutting period is decreased from the duration of the input video as shown in (6).
Multiple segments will be generated and stored in a specific path after the video is divided.In python, the number of processes is initialized using the multiprocessing module.Each specified video frames are celled using OpenCV by its specific process and start running simultaneously, Figure 2 shows the running diagram for the multiprocessor module in python.A while loop in each process can read frames from the specified video portion frames.During the reading of frames, the image processing operations for the fault detection algorithm will be started in each process.All processes are running simultaneously with the same operations; each process should exit from the execution after completing its specific task with no waiting for another process to tackle.
In this paper, the detection of the defects in the PV module and determining the longitude and latitude for the location of the solar panel is done using image processing algorithms.Different types of

Morphological transformation with canny edge detector
In computer vision-based applications, canny edge detection is used to extract useful structural information from different objects which reduces the amount of data to be processed [14], and a canny detector is used to get the accurate information of the target object [15].In this paper, a canny edge detector is used where the input image is converted to a binary image.Then the threshold process is applied on each frame.The value of the threshold, Th, is determined adaptively, and it was re-estimated for each frame in some experiments.A kernel (structuring element) is assigned to implement the morphological transformations [16] and followed by canny edge detection algorithm to detect the defective cells in the solar panel.Edge detection using canny algorithm provides excellent performance results in many practical problems, and it is considered an optimal edge detection algorithm [17].
In this paper, canny algorithm to be applied to identify significant intensity discontinuities in the image.The main idea is finding the direction of the gradient at each pixel.This can be done by finding the first derivative for the horizontal and the vertical directions using the soble filter.The (7) and (8) show the edge gradient and the angle calculations for each pixel respectively [18].The Gradient direction is perpendicular to the edges; its value is rounded to one of four angles representing diagonal directions, horizontal or vertical [18].
After computing the image gradients, the unwanted pixels should be removed by scanning the image in order to identify which pixels do not constitute the edges [18].The last step is the thresholding of the edges.This can be done by using two values for thresholding, minimum (ℎ  ) and maximum (ℎ  ) values.Comparing computed gradients with these two Thresholding values, edges are identified under the conditions in (9).Using a morphological transformation and canny edge algorithm to monitor the real-time operations of solar panel and detect faults is introduced in [19].

SLIC super-pixel algorithm
K-mean clustering is used to implement the spatial localization which is the main concept of simple linear iterative clustering (SLIC) super-pixel technique.Recently, superpixel algorithms are widely used for computer vision and multimedia applications, such as in [20] to close all the contours and reserve coherence across image boundaries.In addition, SLIC is used in the hyperspectral image (HSI) to solve the small sample problem [21].Using SLIC, the image can be decomposed into small homogeneous regions, providing a perceptual understanding of content by locally grouping the pixels.The image complexity, thousands of thousands of pixels, is reduced to only a few hundred of pixels using super-pixel [22].In order to minimize the outliers in SLIC which they would skew the results, a gaussian smoothing filter is used as a preprocessing phase.
Super-pixels is generated to effectively propose SLIC by Achanta et al. [23].The desired number of approximately equally sized superpixels, k is the main parameter of the SLIC algorithm.Initializing cluster centers (  ) at regular grid step is the first step in SLIC by sampling pixels using (10), the number of pixels is presented in N.Then, ( 11) is used to calculate the distance between the cluster center and the pixel.The cluster is moving to the lowest gradient position in a 3x3 neighborhood, the seed location, for each pixel in the 2Sx2S region around for each cluster center (  ).
SLIC corresponds to clusters in labxy color space, where the color and spatial distances should be calculated using ( 12) and ( 13) respectively.They are combined in (14) in order to normalize color and spatial proximities by their respective maximum distances with a cluster,   and   .
The sampling interval value  is considered the maximum spatial distance   within a given cluster.From image to image and cluster to cluster, the color distance can be different so the constant value m in (11) is considered as the maximum color distance   .The new cluster centers will be computed when the pixel is assigned to the nearest cluster, then the distance is recalculated until the residual error between the new and the previous cluster center is less than the threshold value.Using SLIC to monitor the real-time operations of solar panel and detect faults is introduced in [24].

Hot pixels seeds based for segmentation
An image can be divided into constitutive parts or objects is called the segmentation process [25].Segmentation the image provides many operations to be implemented on the image, such as object classification and recognition, the clusters identification, features of similarity or discontinuity between different pixels such as edges and lines [25].The first step of the proposed segmentation method is determining a seed pixel   , (hot pixel), in the input image.The threshold would be more difficult due to the low contrast problem, it is solved by the pre-processing processes by Gaussian filter and histogram equalization for the input images.After image pre-processing, setting the value of the highest pixel is done using (15).The ( 16) is used to determine where the neighboring pixels are linked to the hot pixel, assigning them as seed pixels   , or to the background pixels   .
The mean value µ   for each seed pixel   is calculated using (17).For each seed region with 8 neighboring pixels of the   , the mean value µ   is computed.
At the same time, the average value for all hot pixels µ ℎ_ for each thermal frame is computed.However, the value of hot pixels for the CCD frames is assigned to µ ℎ_ =127 which is a value that worked fine in the most cases.An adaptive method for the selection of these parameters should be investigated further and developed in the near future.The actual seed pixel _  is determined using (18).
Computation of the standard deviation using (20) to estimate the minimal deviation distance (MDD) based on (19) for each actual hot pixel.
The selection of   to be defected or not is based on (22).For each background pixel with its' 8 neighbors the mean value µ   is estimated, then delta value δ is computed using (21).The   is assigned as a defected pixel if MDD value is greater than (δ); otherwise,   is considered as a (zero) pixel.Using hot pixels seeds-based segmentation to monitor the real-time operations of solar panel and detect faults is introduced in [19].

RESULTS AND DISCUSSION
The proposed system proves the use of multicore processors reduces the required execution time for real-time operations.In this paper, the results show that the importance of using a multicore processor with parallel processing using python is reducing the inspection time for large-scale solar system monitoring and detecting hazards.The system has two cameras; the FLIR Vue Pro is a thermal camera which has an accurate thermal resolution with 336x256 pixels which is high enough to show defects on solar panels, and with (NTSC) frame rate.GoPro Hero 4 Black is a CCD camera was used in the system; the camera has the max video resolution 3840x2160 and effective photo resolution 12.0 MP.These two cameras are connected on the Yuneec Q500 quadcopter.The input data were processed and implemented the offline system using python 2.7 and the eclipse IDE platform on a windows 10 environment, where the processor is Intel (R) Core (TM) i5-4210 M CPU with speed 2.60 GHz, and with 8 Gigabyte RAM.Other python modules, extensions and libraries are installed using a pip command; Multiprocessor module, matplotlib, NumPy, and Pillow.OpenCV is used with python for providing multiple modules for image processing.
A multicore system has been used for simultaneous thermal and CCD videos processing to detect defects in the solar panel with a reduction of the execution time.The results of defects detection are explained in previous work [19], [24].The inspection process has been made on real experiments where the drone was flying on panels that were imposed with internal and external defects.The experiments were conducted outdoor in the daytime where the thermal camera would be able to detect the defects in the nighttime.The drone was flying on normal mode without specifying the angel where the altitude was different for many scenarios.Thermal frames and CCD frames are processed for the same panels at the same time.In this paper, the results are recorded for different scenarios for fault detection algorithms in PV systems, using 1 process, 2 processes, or 4 processes.
Table 1 shows the input thermal and CCD videos and the number of processed frames.A different number of frames is shown because the input videos have a different size.Multiprocessing module by python is used to process the input videos and improve the execution time which it is reduced significantly, where the whole system's performance is improved.The processing time is recorded after the segmentation process is completed.Table 2 presents the processing time of using morphological transformation with canny edge detector where the faults can be detected in solar panels using thermal and CCD videos.This execution was done by using 1 process, 2 processes, and 4 processes with the speedups illustrated in Figure 3.The processing time was improved 3.5, and 4.2 times using 2 and 4 processes respectively.Table 3 presents the processing time of using SLIC super-pixel for different size of segments, 50 and 200 with maximum10 iterations for k-mean, where the defects are detected in the solar panel using thermal and CCD videos.This execution was done by using 1 process, 2 processes, and 4 processes with the speedups shown in Figure 4.The processing time was improved 3.2 and 8.2 times using 2 and 4 processes respectively.4 presents the processing time of using the defects detection algorithm, hot pixel seeds based for segmentation.The defects are detected in solar panels using thermal and CCD videos.The achieved speed up is shown in Figure 5 where the execution was done by using 1 process, 2 processes, and 4 processes.Using a multiprocessing module improved the execution time 2.7 and 6.4 times using 2 and 4 processes respectively.

CONCLUSION
Real-time condition monitoring of large-scale solar system needs more processing time in order to monitor and detect faults.The inspection system is implemented using thermal and CCD cameras and the processing time was very long without using parallel processing.This problem is solved in this paper where the captured videos are proceeded using multiple processes simultaneously which reduces the execution time.The speedup we achieved with image processing algorithms is a very significant improvement.The average improvement for the processing time was 3.1 times and 6.3 times using 2 processes and 4 processes respectively.This is due to many reasons including the problem size is large (the number of processed frames), and once the execution time for each frame is long, the speedup using simultaneous processes resulted in a superlinear speedup.The results show that when the problem size is divided into portions and executed among processes simultaneously, the execution time will have a significant reduction and result in a superlinear speedup.In addition, the computer resource utilization will be more effective once the problem is divided into portions; for example, the cache effect will take place once the problem is divided into more than one process via multicore CPU and run simultaneously.


ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 2, April 2022: 1548-1557 1550 ISSN: 2088-8708  Super-linear speedup for real-time condition monitoring using image … (Moath Alsafasfeh) 1551 defects in the PV modules are detected by implementing the different proposed algorithms.On different detection algorithms, different input data is implemented separately, and each algorithm is briefly presented in the following sections to show the computing demands.

Figure 4 .
Figure 4. Speedup for SLIC super-pixel for thermal and CCD videos using multicore

Figure 5 .
Figure 5. Speed up for hot pixels based for segmentation for thermal and CCD videos using multicore

Table 1 .
Input of thermal and CCD videos for defects detection

Table 2 .
Processing time for morphological and canny edge detection execution for thermal and CCD videos using multicore Super-linear speedup for real-time condition monitoring using image … (Moath Alsafasfeh) 1555 Figure3.Speedup results of using morphological with canny edge detection algorithm for thermal and CCD videos using multicore

Table 3 .
Processing time for SLIC super-pixel execution for thermal and CCD videos using multicore

Table 4 .
Processing time for hot pixels based for segmentation for thermal and CCD videos using multicore