Natural language processing based advanced method of unnecessary video detection

In this study we have described the process of identifying unnecessary video using an advanced combined method of natural language processing and machine learning. The system also includes a framework that contains analytics databases and which helps to find statistical accuracy and can detect, accept or reject unnecessary and unethical video content. In our video detection system, we extract text data from video content in two steps, first from video to MPEG-1 audio layer 3 (MP3) and then from MP3 to WAV format. We have used the text part of natural language processing to analyze and prepare the data set. We use both Naive Bayes and logistic regression classification algorithms in this detection system to determine the best accuracy for our system. In our research, our video MP4 data has converted to plain text data using the python advance library function. This brief study discusses the identification of unauthorized, unsocial, unnecessary, unfinished, and malicious videos when using oral video record data. By analyzing our data sets through this advanced model, we can decide which videos should be accepted or rejected for the further actions.


INTRODUCTION
For the last few years, the popularity of video-sharing websites has led to a massive increase in the number of videos uploaded and accessed by vast segments of the population, including vulnerable social groups, e.g., children, teenagers, and adults. The necessity to protect those vulnerable groups from accessing offensive content, along with the inherent difficulty in manually annotating huge volumes of video data, emphasizes that effective, automated and content-based violence detectors need to be developed. Fillipe et al. [1] first introduced a spatiotemporal feature-based violence detection concept to make public spaces more secure and filter unwanted content. In real-life videos, common violent behaviors can be considered as the effect of sexuality, terrorism, criminality, in video content which is detrimental to our social and personal lives also considered as harmful video content. In our research, we developed a neuro-linguistic programming (NLP) based harmful video content detection system where we have selected four genres of videos as harmful video content respectively sexual, political, terrorism and criminal. We used NLP and machine learning to distinguish the genre of videos from any web portal and tried to figure out whether the  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 11, No. 6, December 2021 : 5411 -5419 5412 video contents are harmful or not. We also used logistic regression, a large video dataset, a text format dataset, and python to evaluate our system accuracy, helping us find out the errors. Our main goal is to build a video identification system that can detect harmful videos that are not appropriate for our society and our lives. The main contribution of the research has been summarized is being as: i) We have used NLP to identify harmful video content and reject unnecessary and unethical content; ii) We have proposed and designed a new rapid video detection approach (RVDA) model and application programming interface (API) for video detection; iii) Our main innovation is converting the video into a text format and then analyzing it with a combination of several machine learning algorithms to identify harmful content. In this study we propose a creative back-end Python framework. These models are used for video maintenance such as uploading, receiving and rejecting. We built this API model with NLP and ML architecture. It works efficiently to detect any type of video content (political, criminal, sexual, terrorist) and makes an automatic decision.

LITERATURE REVIEW
In the age of internet, people spend more time in online during their leisure period to watching various online micro audio-video contents. In this proposed system they have completed their research on more than 20 domestic internet video platform. They collect the text from online micro video for review of sentiment value. For review text they are crawling, processing and finally analyzing sentimental value from online video. They have done automatically this process using sentiment dictionary algorithm and framework in [2]. They have presented an automatic technique to detect sentiment from natural audio visual stream which are available in you tube videos. In their work they develop a text based maximum entropy (ME) sentiment detection model. This model can perform to detect the sentiment using ASR for transcripts the video content in their proposed methodology [3], [4]. They came up with a novel approach to classify multimodal sentimental analysis using social media platform. In their work they take web content audio video as input from Facebook, Instagram. To detect the sentiment genre from this diverse modality in their model they used Naive Bayes, support vector machine (SVM) and EIM algorithms for generate the textual feature from audio video [5]. Human can express their sentiment in their own language, speech rather than text or images. In their work they have designed a model which based on the chines micro blog text. First of all, they extract the text into relevant features, using transcriber software. After generating text from video clip, they used support vector machine as classifier to analyze the sentiment. They also recognized visual features from video and extracted text from audio using open EAR in their research. Finally, they obtained sentiment genre namely positive, negative and neutral [6]. Microblogs more significantly published sentiment genre of product reviews, movie reviews, people opinion as well as video contents on various social issues. They have proposed a model for finding the emotions and sentiment using both unsupervised and supervised approach. In their work, they focused more on unstructured social media text to automatically recognize the genre. They divide their data into two category subjective data which reflects the positive and negative sentiment and another is neutral data which do not have any sentiment [7].

PROPOSED MODEL
Our model mainly focuses on the beginner processing way. All representing like video upload and its chunk process (C). We are using RVDA data processing feature with some several steps shown in Figure 1. At the last moment, this working process gives back output and target data accuracy, so that we make decisions. In step (D), we are use machine learning algorithm for data training and testing process. At last Step (I), this is the final outcome of accept or reject to upload the testing video. We are describing how the whole working model will work with a diagram.

METHOD AND ALGORITHM ANALYSIS
In this section, we have highlighted our proposed method where we have completed the implementation part. In our proposed system we train data using supervised learning methods and extract text data from video content in two steps: firstly, video to MP3 and after that MP3 to waveform audio format (WAV). We classify the NLP text chunk approach to prepare the data for analysis and after analyzing the data the Naive Bayes and logistic regression algorithms are classified in this desired system to create distinct precision for the above-related genres in this proposed methodology.

Text extraction
Text extraction is one of the main components of this test. We are using videos featuring 4 genres (sexual, political, terrorism and criminal) for text extraction. We convert video data to text data using the Python "Speech-Identification" Library. In the first stage, we need to convert the video to MP3 format and then we need to convert it to WAV format. To get a better feature of text format, need to convert MP3 to WAV format [8], [9]. In Figure 2, pseudo code procedure to find out basic text format from MP3 and WAV.

Natural language processing text chunk approach
In neural language processing, text chunking is an advance extracted feature for making text preparation for analyzing. This approach is also used in the case of analyzing textual data in text mining and NLP [10]. It is required when we are working with unstructured text. Chunking is an approach to extract phrases from the text which is unstructured. This method has applied to identify constituents like verb groups, noun groups, and verbs. From unstructured sentences or text and also, we have used for text chunking. It does not have any role in their main sentences, also their internal structure is not specified by chunking.

Multinomial logistic regression
When a dependent variable is categorical, but not ordinal, multinomial logit models represent a suitable alternative. The distribution of the dependent variable is called multinomial since it fits the multinomial distribution [11]. When a logistic regression model is evolved, the result has multiple outcomes (more than two or K), that implies that the problem can be considered to fit independent K-1 binary logit models, where one of the possible consequences is described as a pivot, the K-1 results are regressed against the pivot outcome.
Exponentiation the equations on both sides produces probabilities: In the first iteration, only the intercept is a construct without regressors. The next iteration includes regressors in the model.

Naive Bayes classifier
Another method for classifying textual data is Naive Bayes. Basically, it is a method of classification based on Bayesian law [12]. We need a bag-of-word representation of our document before applying this classification [13]. We did this in our data set. We apply Bayes rule (2) to our documents and classes as.
We represented the dataset as set of tokens x1, x2, x3, xn. We can redefine P (d|c) is being as where P (c) considered as total probability of a class. P (x1, x2, x3, …, xn | c) Overall, we found sustainability, reliability and better accuracy using the Naive Bayes classification in our NLP based work for its simple design and efficient approach. A lyrics-based classification has performed based on textual data has performed using naive Bayes [14].

DATA ANALYSIS
Through this study, we have briefly discussed the whole process of data analysis. In the first step, data is processed where the data has cleaned and unnecessary data are removed. Then we partitioned data as testing and train data for our proposed model. We use natural language Toolkit and scikit-learn tools to categorize and process our data to get the text from the video content in their respective genres such as crime, sexual, political, and terrorism.

Data processing
In our model, data processing is considering to be the most advanced part. We have used several steps during the data processing. Using the string lower method, we have converted all the textual data into lowercase data. Stop words was removed from the dataset, because we need to focus on the actual data that defines the actual meaning of the text. Finite state automaton to consider and replace terms removing stop words [15]. Data arrays are divided into training and test data. This provides a better assessment and a stronger baseline that is needed to capture features when working with text data [16]. Term frequency-inverse document frequency (TF-IDF) and count-vectorizer properties were applied in this model. It's a statistical measure that indicates the significance of a word to a data or corpus collection document and TF-IDF features can extract words when needed [17]. We often get some errors during the data training and testing. This can be due to various reasons and this may be due to a lack of models and sometimes due to unnecessary data [18]. We found some discrepancies when categorizing the training data. Count vectorizer provides simple and effective tokenization of both numbers of the total text and creates a vocabulary of known words.

Data-set use for accuracy generate
We have selected four types of genres and the genres are political, criminal, terrorist, and sexual. From these categories, we converted the testing videos into word format so that we can get the text-based dataset. For increasing our dataset, we have collected word-based data from universally available sources on the internet according to their genre as training data. The table shows the distribution of the number of words according to each genre we have taken as our training data. Table 1 shows the distribution of words for each genre. We collected textual information from video content and prepared our test dataset for training. We've experimented with ten thousand short video contents. After training and testing the dataset, our goal is to find out the genre of videos. Multinominal logistic regression and Naive Bayes classifiers help to extract information from textual data [19], [20]. We applied both classifiers separately to the textual data of our video content. We found the accuracy for sample video contents and Table 2, shows the percentage for each video content based on their genre. In this experiment, Precision=tp/(tp+fp). Here, we can consider tp as the number of true positive and fp as the number of false positive: Recall=tp/(tp+fn). Here, we can also consider tp as the number of true positive and fn as the number of false negative. F1-score can be considered as the weighted mean of precision and recall.

RESULT AND DISCUSSION
With the development of technology, we are also making a lot of improvements. The amount of video sharing in online media, online promotion, YouTube, social sites are increasing day by day. But not all videos are actually user-accessible or user-profitable. Because videos can contain a lot of things like crime, terrorism, political, sexual content that carries a bad message for our society. Our research aims to see if we can detect these contents from video in advance [21]. For the determination of those unauthorized content, we proposed 3 categories which are normal, moderate, and extreme. Figure 3, illustrates that the normal content of a video to an extreme content video. When the detection level lies at a normal level the unethical contents will exist in 0 to 35%. When the level lies at 36% to 55% then the system will consider it to be at a moderate level. But when the percentage value is up to 56% then it's called an extreme situation and at this stage, the content is violating all the rules and regulations. After uploading any video content to our Web framework, our back-end algorithms will categorize the content and justify the selected genres (sexual, political, terrorism and criminal) that we discussed earlier.
We will allow video content on our website if the overall percentage of accuracy is normal according to our classification. We will allow the content if it meets the intermediate level, but we will display a warning message for the uploader. If any video content crossed the normal and moderate level, we will mark this as extreme leveled then the video content will be rejected for further processing. Figure 4(a), shows the results of three video contents that reach three levels individually.
Overall maximum accuracy, we have considered for video number 1. We have got is 60%, and 58%, according to the Naive Bayes and multinominal logistic regression classifier and it meets the terms and conditions of an extreme level, that's why the content will be rejected. In video number 2, the maximum final accuracy we get is 45% and 44%, which are combined with the medium level. So, we will allow this video content, but display a warning message for the video content uploader. Also, from Figure 4(b), we can determine that the error rate and accuracy rate of our proposed model. This system correctly classified 2317 contents and misclassified 435 contents. This figure shows that the error rate is quite low and the accurately classified rate is quite satisfactory. We evaluated the overall performance of each classification we used to classify the genre of video content in our proposed model [22], [23]. Table 3, shows us the percentage accuracy value that we got 82% for Naive Bayes and 79% for multinomial logistic regression.

FRAMEWORK FOR PROPER OUTPUT
In our work, we have created a model framework. It redundant tasks, it cuts the development time and helps to focus developers to focus on the application logic rather than the routine works. We use python, Django to develop our model framework. Django is one of the best and open-source python frameworks which follows the architectural pattern of model-view-controller (MVC) [24]. From Figure 5(a), we can see that when someone wants to upload some kind of video this window will appear. In this window, they can upload a video. Then the video will be processed using the above algorithm approach and based on the model. All the processes will work automatically following the algorithm. This API is testing API in this research and all background coding apply with our analytical algorithm methods. To reduce dependency on the user-provided captions in online videos and automatically identify the video contents using a multimodal genre-based video classification technique is discussed [25].
The video will be uploaded and selected if it is suitable for the next purpose. Figure 5(b), shows that if a video meets all the criteria, it will appear as selected and the video will be allowed for further use. But when the video does not meet the criteria, the video will be classified as rejected. The video will no longer be released for further use.

CONCLUSION
Online video promotion is a great advantage of the popularity and branding of commercial, social, educational areas. Video platforms such as various web portals, YouTube is the most popular site to publish people's own views and share ideas. As technology advances, sometimes we have to face some unethical videos which are based on politically unsound also unauthorized content of sexual, criminal, or support to terrorism and this content publishing openly which are the impact on psychologically and morally the users around the world. However, in order to sort out and reject many harmful video contents such as sexual, political, criminal, terrorist attacks for this domain we have presented unauthorized video genre classification from different web content through our proposed model and framework. The main contribution of this paper is that classify the genre namely sexual, political crime, and terrorism with good accuracy in both classifier method logistic regression and Naive Bayes and for text extraction. We use NLP which provides accurate classification. The experimental result shows the robustness of the proposed system. The experimental results are assessed and compared in terms of accuracy, precision, recall, F1-score.

ACKNOWLEDGEMENT
To complete our research work Daffodil International University help us. We use NLP and Machine learning lab for better result. But to complete our research we have no special funding.