Propose shot boundary detection methods by using visual hybrid features

Shot boundary detection is the fundamental technique that plays an important role in a variety of video processing tasks such as summarization, retrieval, object tracking, and so on. This technique involves segmenting a video sequence into shots, each of which is a sequence of interrelated temporal frames. This paper introduces two methods, where the first is for detecting the cut shot boundary via employing visual hybrid features, while the second method is to compare between them. This enhances the effectiveness of the performance of detecting the shot by selecting the strongest features. The first method was performed by utilizing hybrid features, which included statistics histogram of hue-saturation-value color space and grey level co-occurrence matrix. The second method was performed by utilizing hybrid features that include discrete wavelet transform and grey level co-occurrence matrix. The frame size decreased. This process had the advantage of reducing the computation time. Also used local adaptive thresholds, which enhanced the method’s performance. The tested videos were obtained from the BBC archive, which included BBC Learning English and BBC News. Experimental results have indicated that the second method has achieved (97.618%) accuracy performance, which was higher than the first and other methods using evaluation metrics.


INTRODUCTION
A video structure is a composition of scenes, shots, and frames. A video scene comprises a sequence of shots that are made up of interrelated events recorded at various camera positions. A video shot contains a sequence of interrelated frames taken by a single camera action. A frame is the smallest unit in a video where each frame represents a single image [1]- [4]. Shot boundary detection (SBD) in video is also called shot segmentation, and it is a technique of segmenting a video sequence into shots, which are smaller temporal units. SBD is the fundamental step that plays an important role in video processing tasks such as video analysis, summarization, retrieval, indexing, tracking an object, search, and content-based methods. Therefore, SBD's efficient method improves video processing [5]. This paper proposes a technique for shot boundary detection based on hybrid features.
Video SBD is an essential technique in video processing tasks which divides video sequences into smaller temporal parts called shots. A shot contains a sequence of interrelated frames of actions taken by a single camera. The main concept of the SBD technique is extracting efficient visual features from frames. Then, comparing the similarity between frames with a threshold value that is predefined after that, detecting a shot when the feature difference is greater than the threshold [6]. There are two types of shot boundaries, showed the average F-measure for the cut shot was 93.26%. Sulaiman and Mahmood [27] have proposed an approach for SBD based on mean shift and dynamic time warping (DTW). The concept of this method was first performed by first performing preprocessing, which included converting the frames into YCbCr color space and dividing each frame into blocks. Then DTW was used as a distance measure to calculate differences between successive frames, and next normalized it. After that, a shift mean technique was used for shot boundary detection. Finally, for each shot, extract key frames that have higher content change. However, the experimental results that evaluated random videos and got from the open video project (OVP) dataset showed the average F-measure was 97.5%. Idan et al. [28] proposed a method for SBD that has achieved speed and accuracy depending on the moments and algorithm of support vector machine (SVM). The mechanism was performed by selecting an active area that contains the important information so that it reduces computation time. Then compute the moments for those active areas. The squared Tchebichef-Krawtchouk polynomials were used to extract features and use adaptive threshold. Furthermore, SVM was used to detect the cut boundary. However, the experimental results that evaluated the TRECVID 2001TRECVID , 2005TRECVID , 2006, and 2007 datasets showed the average F-measure was 96.15%. They have also suggested performing their method on different types of shot boundary and various applications in the future. Table 1 demonstrates the comparison methods of related work focusing on methodology, datasets, and average value of evaluation measure for video shot boundary detection methods.

RESEARCH METHOD
This paper has proposed two methods and compared them for cut shot boundary detection based on visual hybrid features that enhance the performance of detecting shots by selecting the strongest features. A detailed description of the proposed method with a general block diagram and algorithm for each method will be discussed as follows.
The first method for detecting cut-shot boundaries in video used hybrid features such as HSV color space statistics histograms and GLCM. The details are described in this section for each step. The first step is preprocessing, which includes extracting frames from video. Then resize the frames to 256×256. This step has the advantage of reducing the computation time.
The next step is extracting the first feature; this is a fundamental step in detecting shot boundary. The first feature is about visual color features, which includes extracting the chrominance feature from HSV color space. This feature is not affected by low-cal motion. Then it computes histograms of the chrominance feature and performs normalization. The histogram describes the distribution of color and disregards the spatial relationships, so it is robust to changes in scale, rotation, and camera movement. Then, from these histograms, it extracts the statistics. Features that include mean, median, standard deviation, skew, and entropy. So, each frame is represented as a vector of five features.
After that, the second feature is extracted to represent the texture feature. This includes converting frames into grayscale and extracting the GLCM feature, which is a powerful feature. GLCM calculates frequencies of a neighboring relationship between pairs of pixels at angles of 0°, 45°, 90°, and 135°, and distance 1. The GLCM is then normalized, and the correlation is computed.
Then a similarity matching step is done. This step matches the similarity between consecutive frames' features. Similarity matching is calculated for first and second features by using the Euclidean distance. After that, it computes the average of matching vectors of both features and returns the vector of average matching.
Finally, it calculates the local adaptive threshold, which is better than the global threshold. Using a global threshold for all frames is not an efficient approach because the video content is changed dramatically, making it hard to find a global threshold that fits all frames. While local thresholds are variable along with the content of video frames, therefore, in the proposed method, local thresholds are calculated based on the mean and standard deviation (STD) for an average matching vector of window size of 250 frames. Where local thresholds are computed using (1), where the value of c is specified as (3.7) and is obtained by experimental means until the best performance results are achieved. Then, a comparison is established between the average matching vector of the frames and a threshold value. If it is greater than the threshold value, the frame that corresponds to the index is considered as a cut shot boundary detection. Figure 1 illustrates the general block diagram of the first proposed method. Algorithm 1 proposed cutting SBD using hybrid features, which included statistics histogram of HSV color space and GLCM.
Algorithm 1. Cut SBD using statistics histogram and GLCM hybrid features.
Input: Video Output: Frames that represent cut SBD Process: Step 1: Load video.
Step 2: Extract frames from video.
Step 3: Resize the frames.
Step 4.2: Compute histograms for H of HSV and perform normalization.
Step 4.3: Compute statistics features for histograms that include: mean, median, standard deviation, skew, and entropy.
Step 5.2: Compute GLCM then perform normalization and correlation.
Step 6: Compute similarity matching for first and second features.
Step 7: Compute the average of similarity matching (Hybirdmatch[i]) for hybrid features.
Step 10: Then it considered as cut SBD.
Step 11: Get frames that correspond the cut SBD.
Step 12: End.  The second method for detecting cut shot boundaries from video employs hybrid features including DWT and GLCM. The details are described in this section. The steps of the second method are the same as the first method except for the extraction of the first feature. DWT is used rather than the statistics histogram of HSV color space. The first feature is about texture features. That includes converting frames into grayscale. Then, when computing DWT, the Haar wavelet function is used within the DWT. This feature is an efficient, flexible, and robust feature selection. Then, for each frame, extract the low low (LL) sub band from DWT, which contains the most important feature information. Figure 2 illustrates the general block diagram of the second proposed method. Algorithm 2 proposed cutting SBD using hybrid features including DWT and GLCM. Input: Video Output: Frames that represent cut SBD Process: Step 1: Load video.
Step 2: Extract frames from video.
Step 3: Resize the frames.
Step 4: Convert frames to grayscale.
Step 5: Extract first feature: Step 5.1: Compute DWT and extract LL.
Step 6: Extract second feature: Step 6.1: Compute GLCM then perform normalization and correlation.
Step 7: Compute similarity matching for first and second features.
Step Step 11: Then it considered as cut SBD.
Step 12: Get frames that correspond the cut SBD.

RESULTS AND DISCUSSION
In this section, experimental discussions of the tests were exhibited to show the capability of the suggested shot boundary detection techniques. It holds information about the tested videos that were used to evaluate the performance of suggested techniques and, furthermore, a comparison with some previous techniques. All the tested video materials were downloaded from the BBC archive, which includes BBC Learning English and BBC News. Table 2 contains the details of tested video files that involve video duration, the number of frames, and also the number of ground truth of cut shots boundary. To evaluate the performance of the proposed shot boundary detection techniques, evaluation metrics were applied, which were precision, recall, and F-measure. These metrics were evaluated by computing the number of true, false, and missed detections compared with the ground truth of shots. Whereas the high value of these metrics indicates perfect performance. Tables 3 and 4 illustrate the performance of the first and second proposed SBD techniques, respectively, using precision, recall, and F-measure applied to test videos.
According to both Tables 3 and 4, it is noticed that there are high values of evaluation metrics which indicate that proposed methods have achieved high accuracy performance. The average value of the F-measure of the first proposed method has been achieved (93.85%). While the average value of the F-measure of the second proposed method was achieved (97.618%). It seems that the second method using DWT and GLCM hybrid features has achieved higher accuracy performance than the first method using statistics histogram of HSV color space and GLCM hybrid features. Table 5 displays the comparison performance of the suggested method with other SBD methods according to the average value of the F-measure.
The high value of F-measure indicates an accurate performance. Therefore, according to Table 5, it seems that the proposed method had better performance than others. The big challenge in the SBD method is obtaining the optimal threshold value. According to experimental results, changing the way to compute threshold values was performed when using local adaptive thresholds improved method performance compared to when using one global threshold.

CONCLUSION
This paper has introduced two strategies and compared them for cut shot boundary detection from video files based on visual hybrid features and exhibits a general framework and algorithm for each strategy. Furthermore, a comparison of the proposed technique with other techniques was presented. The frame size was decreased, where this step had the advantage of reducing the computation time. Using local adaptive thresholds has improved the method's performance compared to global thresholds. Evaluation metrics such as precision, recall, and F-measure were used to evaluate the performance of the proposed techniques. The tested videos were obtained from the BBC archive, which includes BBC Learning English and BBC News. Experimental results have indicated that selecting the strong features as in the second method, which is based on DWT and GLCM hybrid features, has achieved higher accuracy performance than the first method based on the statistics histogram of HSV and GLCM and also higher than those other methods.