Improved feature exctraction process to detect seizure using CHBMIT-dataset

Received Jul 27, 2020 Revised Jul 24, 2020 Accepted Aug 22, 2020 One of the most dangerous neurological disease, which is occupying worldwide, is epilepsy. Fraction of second nerves in the brain starts impulsion i.e. electrical discharge, which is higher than the normal pulsing. So many researches have done the investigation and proposed the numerous methodology. However, our methodology will give effective result in feature extraction. Moreover, we used numerous number of statistical moments features. Existing approaches are implemented on few statistical moments with respect to time and frequency. Our proposed system will give the way to find out the seizure-effected part of the brain very easily using TDS, FDS, Correlation and Graph presentation. The resultant value will give the huge difference between normal and seizure effected brain. It also explore the hidden features of the brain.


LITERATURE SURVEY
Some of the previous paper we got to know that feature extraction was not performed, they have used deep learning models with trained raw EEG signals [5][6][7]. From last few years, machine learning and data learning entered into the computer vision. DL and ML have ability to do multi-task at a time such as detection of object with sequence learning [8]. Feature extraction is an easy way to give input to the classifier. They eliminated the process which is more complex i.e., giving raw EEG samples to classifier. Some of the data learning techniques are need to be addressed. The major challenge of every seizure detection is to find correlation between feed randomly choose EEG timestamp to classifier and EEG timestamp. The major issue with this technique is they got feature extraction ambiguity and the process fails to recognize the temporary signal patterns. EEG contains set of multiple channels for detecting EEG seizures [9]. Now a day's research of epilepsy is based on machine learning procedures, collects the emotional condition of a brain using Bayes classifier (BC). They were used 1902 statistical structures with 23 Electroencephalography signals and 10-15 aged people were diagnosed. Using wavelet DBFour Shanon's entropy they extracted features from each patients and that process consist of 4 levels. Once after getting the signal feature selection is performed and then it given to the feature extraction. They invented som software, which is trained software records the changes in brain action. Using this technique, accuracy they obtained is 75%. Main part, which we have noticed that is they were not selected universal feature for classification in statistical structures (features) because of variation in individual signals [10].
Since from 1970's Automatic seizures prediction has been proposed and investigated many technologies on seizure prediction, they invented closed loop system in order to detect the epileptic seizures. The main intension to build this system was determining the absence and the presence of the ongoing seizures and provide rapid therapy in their clinical onset. This algorithm collects the appropriate features or quantitative values from set of EEG data, ECG, EDA, movement and other biomarkers. Based on that they recognized presence and absence of ongoing seizures and in classification, it collects the data from external device and based on these data training and supervision phase can be implemented, and next compute filtering, feature computation collected from transmission of data to preprocessor in analysis phase if it shows abnormal the device will give alert message to respected physicians and this result leads to lower performance of electrical activity of the brain. The main disadvantage of this device is very low quality of EEG transmitted signal. Numerous technology has been proposed on automatic seizure prediction we have mentioned above, continuation of that automatic seizure detection mainly focus on 2 major factors one is extraction of features and classification every seizure detection mainly focus on extraction of features as a key. Goal of this process is to differentiate the EEG patterns, so this effects on classification of EEG. Observation of this model M. Ahmad, proposed model he categorized model into two parts one is spectral domain feature another part is temporal domain feature. The implementation of this handcrafted feature though it differs from various technology, continuously it involves expert to acquire lots of presentation from data [11].
We have heard about magnetic resonance imaging (MRI) which captures the inner structure of our body. In Previous researches main source of the epilepsy is MRI Scanning. It captures the inner activity of the brain. Using MRI 2 activities were extracted that is GLCM, GLRM. For classification purpose they have used RBF and PSVM which gives output as regarding absorb time and amount of error captured during transmission. The input of this technique accept only matrices expression solver.
After so many years of investigation handcrafted features coming to detect the EEG, they were used Fourier transform [12], which has various techniques to detect the EEG. It involves the knowledge of relation between data in which other papers didn't invented. It is a signal processing technique. Using this type of technique, we can easily extract the features brain abnormality, which reflects in increasing of amplitude [13] to overcome this problem they have used Fourier transform. Proposed system make use of Fourier transform with some advanced features to extract the features by signal processing. In this paper, we have used universal statistical activity, which overcome the problem current methodology. Recently, most of the authors [14] have introduced hybrid-electroencephalogram (EEG) classification method based on greywolf-optimizer (GWO) improved SVMs called the method of GWO-SVMs for the detection of automaticseizure. In [15], this work introduces 13-layer deep convolutional neural network (DCNN) is implemented to identify the seizure, normal and preictal classes. This method obtained specificity, sensitivity and accuracy of 90.00, 95.00 and 88.67% respectively. Whereas in [16], developed a method to attempt to rxtract automatically and categorize the semiological patterns via facial expressions. Afterwards, they address the limitations of computer-based-analytical methods of the epilepsy-monitoring, where the movements of facial have been ignored.
In [17], represents the overview of seizure prediction and identification and give propoer insights on the challenges. Secondly, covers few of stste-of-art seizure prediction and identification algorithms and give the comparision among these algorithms. Whereas in [18], represented two types of automated technique for analyzing the recordings of epileptic EEG that have been repoted in this review paper: those aimed at the detection of inter-ictal spike and eplitical-seizure analysis and the characterization of the abnormal EEG-activities in the recordings of long-term. In [19], the author overview the methods and definition for graph-clustering that is finding the sets of vertices in the graph. Afterwads, they review several definitions of cluster in the graph and measure the quality of cluster. Then, they represent global-algorithms for giving the cluster foe all vertex set of the input-graph, after which discusses the task of recognizing a cluster for the specific seed of vertex by the help of local computation. In [20], represents novel method called clustering method based least-square SVM for classification of the EEG signals. The decision-making can be performed in two phases. In 1 st phase, Clustering-technique (CT) has been utilized to rextract the representative features of the EEG data. In 2 nd phase, LS-SVM (least square) is applied to extract the feature in order to categorize the 2-class of EEG signals.
In [21], this paper intends to recognize the technique that would categorize the sleep phases automatically and with the higher degree of accuracy. This study also contains three phases like feature selection from the EEG signals and feature extractiuon and the classification of these given signals. In paper [22], the analysis of graph is utilized for neural networks (NN) model, functional connectivity, anatomical connectivity based MEG, fMRI and EEG. These studies propose that human-brain is modelled as the complex network, and may have smaller structure at anatomicl level as well as the functional connectivity. Whereas in [23], the author addresses the issue from aspect of updating the feature extractors and introduce the apaptive feature-extractor, namely adaptive common spatial patterns (ACSP). Deep lerning (DL) is a novel research direction of machine learning (ML) that automatically learns the sample data features and inherent laws. As both computational ability and available data of hardware continue to maximize the DL that has been addressed progressively complex applications with higher accuracy [24,25].
This paper [26], the author tries to detect its effect on brain. In order to get brain insight, author recorded the signal of EEG after and before an OM chanting for ten-subs. The author utilized a method of complexity measure based fractal-analysis to equate EEG signal after and before an OM chanting. Time domain (TD) fractal dimension was computed by utilizing HFD. Whereas in paper [27], the author talks aboult recurrence phenomenon and utilizing this concept it represents new, simple and user-friendly technique to identify neurological disorders by utilizing EEG signals. Many researches have started to utilize EEH method that is also defined the method for the emotion detection. Few methods utilized predefined and standard techniques of signal-processing area and few worked with lesse channel to record the EEG signals for research. This paper [28], introduced an emotion detection technique based on the TD statistical-features. The author attempt to categorize 2 various emotional states by signle chsnnel of EEG recordings in paper [29]. This work is the continuation of prior study [30] where beta-band was discovered compatible for the analysis of hand movement. The DWT has been utilized to separate the beta-band of EEG-signal to extract the features. In order to perform the PNN area is investigated to discover the classifier of right and left hand movements of EEG signals and compared with the back propagation based NN.

PROPOSED METHODOLOGY 3.1. Evaluation of EEG
In this research EEG data, we are downloaded from CHB-MIT SCALP electroencephalography (EEG) database [4] which is freely available PhysioNet.org. The total time taken for EEG recordings is 983h EEG epoch contain the seizure onset, offset time intervals ictal activity were manually done by clinical experts. The EEG signals were collected at Children's Hospital Boston were 23 pediatric patients with intractable seizures (5 males: ages 3-22; 17 females: ages 1.5-19; 1 missing age/gender data) in order to estimate their possibility for surgical intervention. Most of the file contain 23 EEG signal. In some of the file it can be 24 or 26. (After 1.5 years chb021 was found same female subject from chb01) age and gender of each object placed in SUBJECT-INFO file. All EEG signals are sampled at the rate of 256 sample/sec and resolution of 16 bit from electrodes. Electrodes are used according to International 10-20 system. In overall 24 cases, signals are partitioned in 1 hour long epochs, here we can see epochs are up to 2-4 hours in duration. According to the database complete 24 cases are exploring the frequent changes while during EEG recordings. Changes in the sense removing, adding the channels from one to another epoch. After downloading dataset, we have to select the channels that is selection of channels. For selection of channels we choose only that channels which are constantly available after the completion of training and testing of continuous cross validation and selected channels are interchanges while during cross validation. The main goal of this step is to analyze the quality of data heterogeneity. Among the 24 channels 18 channels shows the stability including T7-F7, FP1-F3, C3-P3, FP2-F4, F4-C4, P3-01, C4-P4, FP2-F8, T8-P8, FZ-CZ, T7-P7, CZ-PZ, FP1-F7, F3-C3, F8-T8, P8-02, P7-01, P4-02. In this study we are initially take the 18 bipolar raw EEG channels which are present on the dataset files in order to avoid the artifact and noise rejection techniques, so there will be no further preprocessing.

EG segmentation
In this paper, before give input to the extractor we breakdown the raw EEG signal into interval of 5sec signals. That is 1280 sample of data will have produced and give it to further process.

Feature extraction
Proposed system architecture shown by Figure 1. Here raw EEG signals for each dataset are initially split into 5-seconds long segments. By using annotation in each segment we will assign ictal, preictal, interictal class and we will check adjacent of each segments if there is any overlapping or no ictal class are discarded because it contains actual seizures activity which are not mandatory in binary classification. Binary classification will have performed between preictal and interictal segments, and then these segments will give input as feature extraction. Extracted EEG features produce huge volume of 642*1 features vector. This feature extraction contains cross-correlation, graph theory, time and frequency unit. EEG analysis involves estimation of cross co-relation between channels and various information through graph.

TDS
Previous studies of time domain system (TDS) in every seizure detection we considered only variance, mean value along with this we added no of zero crossing, difference between different range of amplitude value etc. TD defined as analyzing mathematical function, physical signal with respect to time which means specific signal calculated with certain measurement of time with analyzed component. In this analysis, we are taking 5 analyzed components, i.e. statistical moments, standard deviation, zero crossing, peak to peak voltage, total signal area. With the length of a signal we will estimate the variance, means with respect to time. By auto-correlation, we can easily find out the density and repetition of a signal. Time domain will give the proper result of synchronization (see the similarities between one pair of signals) Statistical moments: It is used to find central tendency for set of data. Mean, variance, kurtosis, skewness will come under the mathematical moment. Statistical moments offer a variability, location and   , n = 1, 2, 3 … b. Variance (F (t_variance) ) After finding average of signal check whether there is an irregular signal is present or not, and also it checks for how much regular signal spread over a window. The EEG signals are changeable so that we need to estimate variance variability. The total variance is given by as follows: Skewness is to find how the signals are formed based on dataset, technically we call it as distribution shape. Next thing we notice in that is how many peaks it has, whether it is symmetric or asymmetric.
The skewness estimation as follows: ] skewness is moderate (4) d. Kurtosis (F t_kurtosis) ) Kurtosis is the shape caption to skewness. It is a symmetric calculation to find variability.
e. Peak-2-peak voltage Generally, peak-to-peak voltage is used to find highest and lowest value of the signal, measurement of signal magnitude is amplitude.
f. "0" crossing Zero crossing is used for reducing noise ratio. "0" crossing is based on counting the waveform, in which how many times it crosses the zero. If the sample s(i) increases, then equation becomes: g. Total signal range (TSR) It indicates the highly seizure predicted area. If the strength of the signal is high, it evaluate the high range of seizure activity. Here we have used trapezoidal method for finding exact location of the seizures. If E(n) is segmented EEG signal, ( ) 2 is the energy fouded by the signal. The total energy range is given by:

FDS
Frequency domain system is to extract the energy percentage and total energy spectrum using rhythmic bands. We will find Energy percentage by bands for example alpha band (9-13 Hz), gamma band (30-55 Hz), beta (14-30 Hz), delta band (0.5-3 Hz), theta band (4-8 Hz). Each of the EEG signals are extracted using wavelet transform equation and discrete Fourier transform equation. Also used for 7-level decomposition as a primary wavelet which is selected based on 256Hz sampling frequency with collected CHBMIT-dataset. The main advantage of the frequency band is minimum depth involved for separation of 1Hz, which is occupied by artifacts. a. Fast fourier transform-power spectral density This analysis related on specific intervals of time of EEG data. It consists of predefined set of time intervals. Parameters that are observed in this computation is reactivity, relative power. EEG signals are highly non-stationary so we prefer FFT. Various power spectral density used for extract the features to find energy percentage. Waveform very easiest way to find any changes happening in our brains. Waveform are categorized based on amplitude, frequency, position (where electrodes presented on scalp of the brain) there are some bands used in this proposed system to measure the energy percentage.
This is the waveform, which has maximal amplitude with slowest wave. Found this waveform in all age sleep mainly in 3 rd and 4 th stage. Rarely experts will get this type of waveform.
-Alpha Band  It occurs in all aged people mainly found in adults. We can identify by the person who is awake with closed eyes and he/she is relaxing. It arises dominant and non-dominant part of the brain. Most significantly This band level will produce cortisol. If a person involves in judgement, decision making that time beta band occurs.
-Gamma Band  This band come around the consciousness of the brain. This will occur when person in hyper alertness and who have good sense. This recording computed after production of digital EEG. It restrict the record which is having below 25 Hz.
b. Discrete wavelet transform DWT plays vital role in our paper, the process of establishing the hidden features or information from the brain we call it as discrete wavelet transform. This will quickly dissolve the oscillating function to estimate the frequency domain feature and time domain feature. Discrete waveform results in decomposition of signals into translated and scaled version. To develop the feature vector of every segments of EEG we integrated DWT in our paper. DWT is obtained by: where , represents the coefficient of wavelet, j represents the level, k represents the location, , ( ) represents the dyadic wavelet, a= 2 b=k are the sampling with translation parameter.

Correlation a. Cross-correlation
Cross-correlation is to check the similarities between two different EEG signals and time delay between them. The highest cross-correlation value between two signals is to measure time delay between two signals of functional brain connectivity. This process evaluates the connection between global area and local area of the brain. The maximum cross-correlation between two signals is given by (1). If i and j are the two signal the connectivity between them is: b. Decorrelation Decorrelation of signal with respect to time we found by simplest methodology: continuous evaluation of increasing offset of a signal and note down the how much time takes to come under the below threshold.

Evaluation of EEG using graphs
The conventional network analysis is used for graph theory measurements to describe the functional connectivity between 2 EEG signals. Here latter's are defined as set of channels and weighed edges (Wxy -> which connects the x and y channels respectively). The graph is constructed by using indexed cross-correlation value as connectivity measurements which has already defined in section. Thus, we have used pre-estimated set of global and local extracted graph. We have introduced clustering coefficient in which it compares the total no of signals with its neighbor signal. To ensure that neighbor signals are not same as input signal so that we expect good performance in this process.
-Clustering co-efficient Clustering co-efficient is measurement of portion of the graph, which is tends to generate a cluster. In other words, how clusters are created on the network, i.e. group of nodes connected each other in one group. -Local efficiency Local extracted graph can be used for local efficiency (find the average shortest path length of channel), equilibrium (shortest path between two channels, which is divided by another channel), and peculiarity (defined as longest path from root node to any other node).
-Betweenness centrality and eccentricity: Shortest path from any two nodes which passes through the node I divided by average no of path between those two nodes. Eccentricity defined as longest path from node I to any other node. -Global efficiency: Global extracted graph can be used for global efficiency (which is to find total efficiency of the network), radius of the graph, diameter of the graph, features of path length. Global efficiency provides length of shortest path , between couple of node. Minimum no of nodes traverses from i to j in order to find mean of the shortest path length. Final equation for feature extraction f will be:

EXPERIMENTAL RESULT
The proposed system is the integration of TDS, FDS, graph, correlation. EEG plays vital role in detection of Epilepsy. We take the seizure and without seizure data from CHB-MIT dataset for feature extraction process. The main aim of this review is to identify the seizure from brain and also check how fast it identifies seizure and to check reliability. So many authors find out the solution for prediction of epilepsy 835 but the problem in that is time, existing system are time consuming. Our proposed methodology prove that while during extraction how fast it will detect the seizure and how easily extracted feature will express the different states of mind. In order to increase the speed and decrease the time, it include TDS, FDS, graph representation, correlation. This various feature gives the significant difference in seizure and non-seizure signal. Blue graphs indicate the non-seizure and red graph represents the seizure patients.

Resultant value of TDS
While during feature extraction, we observed that there is a momentous difference between normal (without seizure) and seizure state individuals. For TDS we have used statistical moments like mean, variance, kurtosis, skewness, zero crossing, peak2peak voltage, total area. Each statistical moments shows the difference between seizure state and normal individuals. Figures 2-9 shows the comparison of normal and seizure state obtained by multiple sets of EEG from CHB-MIT dataset.
Mean, total are of signal, peak-2-peak voltage and variance value of the signal shows rapid decreases in seizure state compared to normal state. Among 8 features of TDS, skewness and kurtosis shows the significant difference between those of all other features. Figure 10 shows the difference between of normal state and seizure state during TDS.

Resultant value of FDS
Finding the seizure and non-seizure patients is not an easy job. Introducing frequency domain system in seizure one of the useful and fastest technique to detect the seizure. Using FFT with 7 energy bands we can easily study every part of brain and we can also recognize the difference between seizure and non seizure patients. We detected that value of power is high compare to non seizure dataset. Here so far previous researches are introduced a discrete wavelet transform which is recognizes only unstable signals. However, in proposed methods, mother wavelet produced from dwt. Figures 11-13 shows the result.

Resultant value of correlation
Randomly chosen signal from the dataset will show the mean cross-correlation and decorrelation performance. Moreover, all of the signal output shows significant increases in the signal. By doing this we can easily find out the similarities between 2 various samples. Increasing cross-correlation value gives the time between two different signals. Below graph shows cross-correlation ( _ ( ) ) value of the total signals and its increasing threshold value. Comparing both normal brain signal with seizure effected signal, shows the huge difference in Figures 14 and 15, and decorrelation value showing that how much time it will take to come below the threshold value.

Resultant value of graphs
One of the easy way of understanding the output is by plotting graph, which will find out the abnormal state of the brain. Figures 16-22 show that efficiency, diameter, radius, local and global range of both normal and seizure affected brain. Radius of the brain will decreases compared to normal brain and clustering is to do the process very fast, this will increase the performance very high. Compared to normal brain the clustering of the patient brain (with seizure) value decrease. Seizure effected brain will shows that decreasing the efficiency, diameter, and radius. By this graph clustering will help to find out which part of the brain will be having less no of nodes that will be the seizure effected part of brain.

CONCLUSION
EEG plays very important role for detection of epilepsy, some of the existing techniques have used trained data and machine learning techniques. It results in high cost, time consuming and low quality of the signal. Therefore, we introduced new process in this paper that, used Electroencephalogram CHB-MIT dataset that contains the 23 seizure affected patient's data. For feature extraction we used time domain (statistical moments, standard deviation, peak-2-peak voltage, zero crossing), frequency domain (DWT, FFT, energy bands), correlation, graph estimation. Resultant graph gives the good quality of the signal and it explores the hidden things of the brain. Although it results in good quality of the signal. In our future framework, we will utilize above features in various model like CNN, random forest and Xgboost, etc. In order to predict the epilepsy with high accuracy.