Optimal coding unit decision for early termination in high efficiency video coding using enhanced whale optimization algorithm

ABSTRACT


INTRODUCTION
In recent times, the demand for higher-definition video services has increased in the applications like digital broadcast and internet streaming [1], [2].To meet the need for transmission and storage of higher resolution videos, a new video coding standard is developed named high efficiency video coding (HEVC) [3]- [5].HEVC is effectively related to the conventional video coding standards developed prior such as moving picture experts group (MPEG)-2, H.264, and MPEG-4 part 2 [6], [7].In HEVC, the compression improvement is usually based on the implementation of new encoding methodologies like asymmetric motion partition, intraprediction modes, and so on.Among the available methodologies, the flexible quad-tree partitioning methodology of coding tree unit (CTU) is efficient [8].When the size of CTU is 64 × 64, the size of coding unit (CU) is 8 × 8, 32 × 32, 64 × 64, and 16 × 16 and its depth size will be 0, 1, 2, or 3.The CUs in the CTUs are partitioned into 4 blocks based on depth range.In HEVC, the optimum CU partition is selected according to the rate distortion (RD) costs [9], [10].In HEVC, CU is further-partitioned into several prediction-units Int J Elec & Comp Eng ISSN: 2088-8708  Optimal coding unit decision for early termination in high … (Suhas Shankarnahalli Krishnegowda) 6379 (PUs).Hence, the optimal PUs modes are considered as the modes with minimal RD costs among several inter and intra PUs modes.In the higher resolution videos, the mode decision process ensures compression efficiency.In a larger space HEVC, the search for the best PU and CU decision results in high computational time and complexity, where it limits the usage of HEVC encoders in real time applications [11]- [13].Some of the conventional methods used for video compression are mentioned as follows; convolutional neural networks (CNN) [14], and adaptive switching neural networks [15].Duvar et al. [16] presented an effective decision algorithm for reducing the encoding time of the HEVC.Initially, the intra block similarity was carried out at the PU level by using the integral images.In addition, at the CU level, an early termination mode was developed.Hence, the developed fast inter mode decision algorithm significantly bypasses the PU modes in PU phase, and also removes the unnecessary controls in the CU phase at a lower depth.In this literature study, the efficacy of the developed algorithm was investigated by means of bit rate and peak signal to noise ratio (PSNR).The experimental outcomes demonstrated that the developed algorithm significantly improves the coding efficacy with low system complexity.In addition, the developed algorithm delivers a good relationship between time savings and coding efficacy related to the earlier approaches.The negative side lobes and artifacts edges were generated in the developed algorithm that affects the system performances.Cen et al. [17] developed a new fast CU depth decision framework for decreasing the computational complexity in the HEVC.The developed framework includes a CU depth range determination and a new CU depth comparison algorithm.In this study, the CU depth range was identified based on CUdepth's distribution in similar sequences.The experimental results confirmed that the computational complexity of the developed framework was low compared to the existing works.The developed algorithm lacks in retaining higher quality videos at the receiver side.
At dissimilar levels of coding abstraction, Jiang and Nooshabadi [18] presented a series of optimization methods for multi-view HEVC.In this literature, the optimized resource scheduled wavefront parallel processing and the quantization parameters based on the early termination of CTU were performed for disparity estimation and parallel motion estimation.From the experimental investigation, the developed optimization methods achieved better experimental results compared to the previous research work in light of PSNR and bit error rate.The developed algorithm effectively reduces the system complexity, but it did not concentrate on the major issue of poor video resolution.Bouaafia et al. [19] presented deep convolutional neural network (DCNN) and support vector machine (SVM) in the inter mode HEVC for optimizing the complexity allocations at the CU level.Initially, the SVM based fast CU model was developed for decreasing the HEVC complexity, and further, the DCNN model was utilized for predicting the CU partition.The experimental outcome indicates that the developed online SVM and DCNN models achieved better results in light of time saving and bit rate.In contrast, the developed algorithm reduces the importance of the color components in the compressed videos.
Ma et al. [20] introduced a new faster intra-coding algorithm for speeding up the encoding mechanism.At first, a faster CU-sized decision model was implemented for selecting dissimilar depth decision algorithms for every coding unit.Then, a faster directional mode decision technique was employed, which compares the directional modes of the parent units.The best directional mode of the parent units and the RD cost of the first directional mode was integrated for selecting the best directional mode for the current unit efficiently.The experimental outcome represents that the developed algorithm attained good performance in the video encoding in light of Bjontegaard delta bit rate (BDBR) and time-saving.The developed algorithm was not able to handle massive workloads at higher speeds.Kuanar et al. [21] implemented a new CNN model for effective CU mode selection in the HEVC.Hence, the extensive experimental investigation showed that the developed CNN model has significantly decreased the encoding time related to other state-of-the-art machine learning models, but it was computationally expensive.
Hassan et al. [22] developed a surgical telemonitoring system based on HEVC by implementing a shallow CNN model.The experimental investigation confirmed that the shallow CNN model maintains higher visual quality with a better bit rate.Compared to the state-of-the-art models, the developed shallow-based CNN model was effective and efficient for surgical tele-monitoring systems.He et al. [23] developed a new fuzzy based SVM classifier for improving the compression efficiency of the HEVC.In addition to this, the fuzzy based SVM classifier was improved by utilizing the information entropy measure for solving the outliers and the negative impact of data noise problems.However, the undertaken CNN model and the fuzzy based SVM classifier was computationally complex and needed high-end specification systems.Imen et al. [24] integrated modified AlexNet and modified LeNet-5 for predicting the HEVC's CU partition.The experimental analysis states that the developed model was computationally complex.The key contributions of this research paper are given below; a. Proposed enhanced whale optimization algorithm (EWOA) to decrease the computational time and complexity of the HEVC, which selects the bit streams in the luma coding tree block for effectively determining the CU neighbors.The EWOA is effective in the optimization problems related to the conventional optimization algorithms such as puzzle optimization algorithm [25] and stochastic Komodo algorithm [26].∆PSNR.This paper is prepared in this manner: methodology details, results and discussion, and the conclusion of the EWOA are depicted in sections 2 to 4, respectively.

RESEARCH METHOD
In this research, the efficacy of the EWOA is tested on a few online videos: PeopleOnStreet, Traffic, kimono, ParkScene, Cactus, BQTerrace, FourPeople, PartyScene, BasketballDrive, Johnny, BasketballDrill, BQMall, RaceHorses, BasketballPass, BQSquare, BlowingBubbles, and KristenAndSara.The sample video frames are graphically indicated in Figure 1.The proposed framework includes three major steps like optimal bit-streams selection in HEVC using EWOA, inter and intra prediction in HEVC, and data transformation by DCT.The workflow of the proposed framework is represented in Figure 2. Initially, the frames from the videos are extracted and the separated frames are given to the HEVC for predicting the motion from the video sequences.The basic design of the HEVC is very similar to the H.264.In this scenario, the block-based coding approach significantly exploits both spatial and temporal statistical dependencies.Generally, the HEVC utilizes flexible and adaptive quad-tree coding block partitions for effective coding, transformation and prediction.The basic information about HEVC is given as follows.

Prediction structure
Generally, the quad-tree block partition works based on the CTU structure, which is more similar to the macro-block.The sequence of frames is named a video, and in HEVC, every coded video frame is categorized into slices and CTUs.Further, the CTUs are sub-categorized into the square regions named as CU.In the HEVC, the CU is predicted using inter or intra-predictions, and the 1st frame of the video sequence at each random access point is coded utilizing intra-predictions.The residual video frames are coded by performing inter-predictions, and further, the residual frames are transformed into transform units (TU) by implementing the DCT algorithm.Usually, the CTUs are made up of two chroma coding tree blocks (CTB), quadtree syntax and luma CTB, where every Chroma CTBs has the block size of (/2) × (/2) and luma CTB has the block size of  × .The CTB size is the same as the size of Coding Blocks (CBs), where the CTB has more CUs and is associated with the Tus and PUs.The inter prediction, intra prediction, and coding modes are selected at the CU level, where  is represented as bit-streams and it will be 64, 32, 16, and 8 bits.
In this research manuscript, the bit-streams are chosen by implementing an effective meta-heuristics algorithm named EWOA, which generally follows the behavior of humpback whales, here, error rate is considered as the objective function.After creating an initial population, the humpback whale improves its location based on the encircling method, and it is mathematically defined in (1) and ( 2) [27], [28].Where,  indicates iteration number,  indicates the distance between a prey  ′ () and the humpback whale's position () and  and  states coefficient values, which are determined in (3) and ( 4).

𝐷 = |𝐵 ⊙ 𝑃 ′ (𝑡) − 𝑃(𝑡)|
(1) where,  represents a random vector, which usually ranges between 0 to 1, and  represents the linearity values, which range between 0 to 2. On the other hand, the bubble-net method is accomplished based on shrinking encircling and spiral updating position, as shown in ( 5) and ( 6).Where, ⊙ indicates element by element multiplication process,  represents constant value that is used to determine the logarithmic spiral shape,  denotes random value, which ranges between [-1, 1], and  ́= | ′ () − ()| represents the distance between the humpback whales and the prey.
( + 1) = {  ′ () −  ⊙    ≥ 0.5  ́⊙   ⊙ (2) +  ′ ()   < 0.5 where  ∈ [0,1] indicates the probability of choosing the shrinking encircling method or spiral method to adjust the whales' position.The humpback whales search for their prey in the exploration section.The position of the humpback whales is updated by computing the random search agents and then finding the best search agents.This process is mathematically indicated in ( 7) and ( 8) [29], [30].
where,   indicates random position, which is determined based on the current population.Due to the lack of prior knowledge, updating the positions of search agents is trapped into local optima problems in the existing WOA.Therefore, a novel cosine function is added with the control parameter  for controlling the whale's position.The inclusion of cosine function in the control parameter provides a better balance of exploitation and exploration and it is mathematically indicated in (9).
During the search process, two correlation factors  1 and  2 are used for regulating the movement of the search agents.As shown in (7) and ( 8) are updated as in (10) and (11).The assumed parameters of the EWOA are: number of agents is represented as 100,  indicates current iteration,  1 = 2.5,  2 = 1.5 and   = 100 represents maximum iteration.Once the maximum iteration is reached, the EWOA automatically terminates.

Inter-prediction in HEVC
In the HEVC, the inter-predictions support the division of prediction blocks (PBs) related to the intrapredictions.Generally, the inter-coded PUs have numerous motion parameters that include reference image indexes, usage flags, motion vectors, and reference image lists.The CU is indicated as one PU, while the CU is coded with a skip model, and it has no efficient motion parameters and transformation coefficients obtained by merging the modes.The encoder utilizes explicit transmission or merges mode of motion parameters for every PU in the inter-coded PUs.Hence, the merged model is employed in the skip mode and inter-coded PU.In the HEVC, the merge mode is utilized for identifying the neighbor inter-coded PUs.The inter-prediction in HEVC has motion vectors with units of one-eight and one-quarter for determining the distance between chroma samples and luma samples.

Intra-prediction in HEVC
In the HEVC, the intra-units generally exploit the spatial correlation of PU and its neighborhood image pixels for effective prediction.The new features like TU, PU, CU, and CTU are defined in the HEVC for achieving higher compression and removing spatial redundancy.On the other hand, the optimization of ratedistortion is carried-out to identify the superiorly best prediction mode of every CU.The RD cost function of intra-prediction in HEVC is defined in (12).
where,  indicates bit-rate,  represents the sum of squared distance between the original and reconstructed pixels, and  denotes the quantization parameter.Additionally, the HEVC uses a recursive structure and squad tree for CUs splitting.Every CU is categorized into 4 PUs and further, the intra-prediction is carried-out for every PUs.The CUs size ranges from 8 × 8 to 64 × 64 pixels and the PUs size ranges from 4 × 4 to 64 × 64 pixels.Subsequently, the HEVC performs the intra block predictions for 4 × 4 to 64 × 64 pixels.Generally, the HEVC supports 35 intra-predictions and it includes 33 angular predictions.From the reconstructed PUs, two reference array sets are used for intra-prediction in the HEVC.The present image pixel  , is projected towards the reference image pixels with a fixed displacement parameter  that helps in defining the angularity of vertical and horizontal prediction modes.The interpolation is carried out at an accuracy of 1/32, once the reference samples   and  +1 are determined and it is mathematically represented in (13).
, = ((32 − ) ×   +  ×  +1 + 16) ≫ 5 (13) In HEVC, the prediction of the angular modes delivers effective intra-prediction, while more edges are presented.The DC predictions are extensively used for predicting the flat-surfaces.The block prediction is generated by a weighted average of four reference samples in the planar prediction, which is determined in (14).

Transformation
In the transformation procedure, the residuals are transformed into TU utilizing DCT.In video compression, the DCT is an extensively utilized transformation technique, which is effective in energy compaction, computation efficiency, and correlation reduction.The DCT includes 16 members, and the onedimensional DCT of 1 ×  vector () is determined in (15) and (16).
where  = 0,1,2, …  − 1.The original feature vectors () are re-constructed from the DCT coefficients [] utilizing the Inverse DCT operation, and it is mathematically denoted in (17).Then, the DCT is extended to the transformation of the image, which is achieved by computing the individual rows and columns of the twodimensional image.The mathematical equation of two dimensional DCT is indicated in ( 18) and (19).
The DCT represented in (18) and ( 20) are orthonormal and perfectly reconstructed the coefficients for achieving infinite precision.At last, the reconstructed samples are achieved from the inverse transformation, and the reconstructed CTUs are arranged for constructing a final image.The experimental results of the EWOA are depicted in section 3.

RESULTS AND DISCUSSION
In this research, the EWOA is implemented using MATLAB R2020a software.The simulation is performed with an i7 processor system with 8 GB random access memory, and 1 TB hard disk.This research study mainly uses HEVC/H.265for motion estimation.The performance of the EWOA is analyzed in light of ∆BR, time saving, and ∆PSNR.Additionally, the effectiveness of the EWOA is compared to the prior research model: online SVM+DCNN [19].The most crucial performance measures of fast encoding: time saving ∆ and ∆ are mathematically denoted in (21) and (22).
where,   and   are denoted as computational time and bit rate of the EWOA,   and   are stated as computational time and bit rate of the existing models.Similarly, PSNR is utilized for measuring the quality of original and compressed frames which is mathematically denoted in (23).Where, the PSNR value of the EWOA is indicated as  and the PSNR value of the existing model is specified as .

Quantitative analysis
By viewing Table 1, the effectiveness of the EWOA is validated with the existing models such as deep CNN [19], online SVM [19] and conventional WOA by means of .Here, the EWOA's performance is evaluated on seventeen real time videos.From the experimental investigation, the overall performance shows that the EWOA outperforms the online SVM [19], deep CNN [19] and WOA in terms of , as shown in Table 1.It implies that the EWOA is more robust in diminishing complexity of the inter-mode HEVC related to other models online SVM [19], deep CNN [19], and WOA.Correspondingly, in Table 2, the experimental investigation of the EWOA is done in terms of ∆PSNR value.From the inspection, ∆PSNR value of the EWOA is higher than the prior models: deep CNN [19], online SVM [19] and WOA.In this scenario, the EWOA almost showed 0.006-0.012dB value higher than the existing models in the real time videos.As represented in Table 3, seventeen online real time videos are utilized for investigating the effectiveness of the EWOA.In Table 3, the EWOA's efficacy is validated in light of time saving ΔT.From the inspection, the EWOA achieved better results compared to the online SVM [19], deep CNN [19] and WOA in light of time saving ΔT on the real time videos, which is the major problem highlighted in the literature section.

Discussion
In the present decade, the HEVC has better coding efficiency, because of the rapid growth of video coding technology.The encoding complexity is increased in the HEVC, while improving the performance of RD.In addition, the emerging HEVC uses new coding structures, which are characterized by TU, PU and CU.It enhances the coding efficiency superiorly, but increases computational complexity on the decision of optimal TU, PU, and CU sizes.Computational complexity remains a vital problem, and it should be considered in the optimization task.As discussed in the previous sections, we proposed a EWOA with quad tree coding and DCT for fast CU partition that superiorly decreases the complexity of HEVC at inter-mode.The proposed framework achieved a good trade-off between the RD performance and complexity reduction.The EWOA with quad tree coding and DCT not only predicts the HEVC CU partition at inter-mode and reduces the HEVC complexity with minimal error value.In this manuscript, almost seventeen online real time videos are used for analyzing the effectiveness of the proposed framework in light of PSNR, time saving ∆ and ∆.

CONCLUSION
In this study, an efficient video compression is achieved by implementing HEVC with an optimization algorithm: EWOA.The EWOA is utilized for estimating the motions from the video sequences.In the proposed framework, the quad tree coding block is employed for partitioning the structures, and DCT is applied to the extracted video frames for improving the coding efficiency.The performance of the EWOA is investigated by comparing the input video sequences with decompressed video sequences in terms of ΔBR, ΔT, and ΔPSNR.The simulation analysis concluded that the EWOA attained better performance in the video compression, and it showed 0.006-0.012dB higher PSNR than the existing models in the real time videos like asketballPass, BQTerrace, BasketballDrive, RaceHorses, BQMall, BlowingBubbles, Cactus, FourPeople, PartyScene, PeopleOnStreet, Johnny, Kimono, KristenAndSara, Traffic, ParkScene, BasketballDrill, and BQSquare.The proposed framework is specifically used for surveillance or conversational videos that largely reduces the bandwidth without degrading the visual quality.The future studies will focus on video coding optimization or perceptual based medical image.Additionally, a novel algorithm can be developed for fast mode selection based on the pattern directions of the neighboring PU.

AUTHOR CONTRIBUTIONS
The paper conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing-original draft preparation, writing-review and editing, visualization, have been done by 1 st author.The supervision and project administration, have been done by 2 nd author.


ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 13, No. 6, December 2023: 6378-6387 6380 b.Implemented discrete cosine transform (DCT) for generating the residuals by subtracting the prediction values from the input values.The efficacy of the EWOA is analyzed in light of ∆BR, time-saving, and

Figure 3 .
Figure 3. Quad-tree structure of the CUs

Table 1 .
Performance investigation of the EWOA and the existing models in light of Δ

Table 2 .
Performance investigation of the EWOA and the existing models in light of ∆PSNR Optimal coding unit decision for early termination in high … (Suhas Shankarnahalli Krishnegowda) 6385

Table 3 .
Performance investigation of the EWOA and the existing models in light of ∆T