Classiﬁcation of emotions induced by horror and relaxing movies using single-channel EEG recordings

It has been observed from recent studies that corticolimbic Theta rhythm from EEG recordings perceived as fear or threatening scene during neural processing of visual stimuli. In additions, neural oscillations’ patterns in Theta, Alpha and Beta sub-bands also play important role in brain’s emotional processing. Inspired from these ﬁndings, in this paper we attempt to classify two different emotional states by analyzing single-channel EEG recordings. A video clip that can evoke 3 different emotional states: neutral, relaxation and scary is shown to 19 college-aged subjects and they were asked to score their emotional outcome by giving a number between 0 to 10 (where 0 means not scary at all and 10 means the most scary). First, recorded EEG data were preprocessed by stationary wavelet transform (SWT) based artifact removal algorithm. Then power distribution in simultaneous time-frequency domain was analyzed using short-time Fourier transform (STFT) followed by calculating the average power during each 0.2s time-segment for each brain sub-band. Finally, 46 features, as the mean power of frequency bands between 4 and 50 Hz during each time-segment, containing 689 instances—for each subject —were collected for classiﬁcation. We found that relaxation and fear emotions evoked during watching scary and relaxing movies can be classiﬁed with average classiﬁcation rate of 94.208% using K-NN by applying methods and materials proposed in this paper. We also classiﬁed the dataset using SVM and we found out that K-NN classiﬁer (when k = 1 ) outperforms SVM in classifying EEG dynamics induced by horror and relaxing movies, however, for K > 1 in K-NN, SVM has better average classiﬁcation rate.


INTRODUCTION
In recent years, emotional interaction between humans and machines has been established as an interesting research area in brain-computer interface (BCI) applications [1]. Out of many studies for emotion recognition, two major categories of studies are most common; one of them relies on tracking the facial expressions [2,3], and the other one records the brain signals from the central nervous system by different brain recording techniques such as electroencephalography (EEG), electrocorticography (ECoG), and functional magnetic resonance imaging (fMRI) during experiencing different emotional states. Among these recording techniques, EEG is one of the most suitable because it's non-invasive, easy-to-use, and low-cost compared to others and it has also been proven to show informative features in response to a particular emotional state [4][5][6].
The frontal brain's activity has been found to be responsible for emotional response [7]. Another study reported similar findings that frontal EEG asymmetry can serve as both moderator and mediator of emotions [5]. Inspired from these findings, advanced engineering techniques such as signal processing, machine learning and human-computer interaction have been started to be applied in order to investigate EEGbased emotion recognition [8,9]. Such investigations focus broadly on two aspects: studying the EEG signal's dynamics through different emotional experiences [7,10], and processing or classification of EEG signals during each emotional state [11,12]. Different emotions such as sadness, happiness, fear, disgust, anger, joy, and relaxation have been studied by tracking EEG dynamics during such emotional states. In some of the studies, emotions are classified in two broad groups: (1) negative emotions (i.e. sadness, disgust, anger, fear); and (2) positive emotions (i.e. joy, happiness, relaxation) [13][14][15]. Out of these emotions, fear and relaxation have been considered in few recent literature [10,16,17].
In order to establish emotional interaction with smart-TV or computers, emotions must be detected with reasonable accuracy by a reliable system which is robust against artifacts and other interference. There are several studies found that concentrated on EEG band power for classification of different emotions. Iacoviello et al. proposed a real-time EEG-based algorithm for classifying self-induced emotions [11] where the subjects were shown a group of symbols to induce different emotions such as disgust, fear, anger, and hunger and finally they achieved an average classification accuracy of 90% using support vector machine (SVM). In [18], a movie induction experiment was performed which spontaneously led the subjects to get real emotional states and classification accuracy of 87.53% using SVM with linear kernel was reported at the end. Naji et al. performed another emotion classification experiment through a music listening task by recording EEG from forehead and reached an average classification rate of 87.05% using SVM, K-NN and CFNN while applying CFNN, the best accuracy of 93.66% was obtained [12]. Using a deep learning network as classifier in [19] along with principal component analysis (PCA) as feature extraction, the authors reported classification accuracy of 46.03% and 49.52% to separate valence and arousal respectively. On the other hand, Mu et al. used a 62-channel EEG recorder for classifying between happiness and sadness on 10 subjects evoked by the images of smile and cry facial expressions and using SVM, they achieved classification accuracies of 93.5% ± 6.7% for 3s-trials and 93.0% ± 6.2% for 1s-trials. They also observed that gamma band is more suitable for EEG-based emotion classification [20].
In this paper, we used a single dry electrode based EEG recorder placed over left frontal cortex (as a simplest low-cost solution) to recognize fear and relaxation through movie screening. Then we attempted to classify EEG dynamics related to fear and relaxation with an acceptable accuracy. We also propose an artifact removal algorithm based on stationary wavelet transform that has been found to improve the final classification rate in emotion recognition task.

BACKGROUND 2.1. Short time fourier transform
In order to see the frequency variances in time, we need a mathematical tool that helps to understand the frequency changes during a specific time. The time-frequency representation of signal can be obtained in different manners. Short duration of signal with window function are extracted using short-time Fourier transform (STFT) and then, its frequency representation is calculated: where h(t) refers to the sliding analysis window of STFT. For a finite energy window it can be represented as: Where

3828
Ì ISSN: 2088-8708 Consequently, STFT is considered as a tool that decomposes the signal into waveforms, and used to find the energy distribution in time-frequency space. Since STFT is easy to use and to understand [21], in this study it is used for decomposing EEG sub-bands and presenting the time-frequency dynamics.

Stationary wavelet transform
The recorded EEG data are contaminated by artifacts with electrooculographic (EOG) and electromyographic (EMG) origins. In order to minimize the so-called artifacts, stationary wavelet transform was applied. In recent years, Wavelet based denoising methods have attracted much attention for biomedical signal analysis, because it is suitable for non-linear and non-stationary signals [22], such as EEG. The choice of wavelet transform over other methods (e.g. independent component analysis (ICA), empirical mode decomposition(EMD), adaptive filtering, etc.) is motivated by its capability to decompose single-channel EEG data into different frequency bands with high temporal resolution followed by easier denoising technique [22]. Another advantage of wavelet-based methods is its lower computational complexity compared to ICA and EMD. In addition, unlike adaptive filtering or regression based techniques, it doesn't require any separate reference artifact channel. There are several types of wavelet transform such as discrete wavelet transform (DWT), continuous wavelet transform (CWT), wavelet packet transform (WPT), stationary wavelet transform (SWT), etc. DWT is the simplest wavelet transform technique, although, we chose SWT for its advantage over DWT due its translational invariance property (small shift in signal doesn't cause significant changes in wavelet coefficients and large variations in distribution of energy in different wavelet scales) [23]. As a result, during signal reconstruction, no (or minimal) distortion in signal occurs [24][25][26].

Classification 2.3.1. K-nearest neighbors
The nearest neighbors searching is the following: given a set S of n data points in the feature space, X D , the task is to process these data points, so that given any query point q X D , the K nearest points {s 1 , s 2 , ..., s k } S to q be reported as quick as possible. If x is an arbitrary instance defined by feature vector as < a 1 (x), a 2 (x), a 3 (x), ....., a n (x) > ,where a r (x) indicates the value of rth attribute of x, therefore the distance is calculated by d(x i , x j ) where: Giving labeled training, K-NN partitions the feature space to polygonal-like sections where each polygon contains all the nearest neighbors to a specific point in feature space. This algorithm can be refined by giving a weight coefficient based on the distance of each points to the query point. Dudani et al. introduced a weighted voting method for K-NN, called the distance-weighted k-nearest neighbor rule (WKNN) [27]. In WKNN, the closer neighbors are weighted more heavily than the farther ones.

Support vector machine
Support vector machine (SVM) is a promising method to classify both linear and non-linear datasets. It uses a non-linear mapping in order to transform data into a higher dimension. The idea is to find the maximum marginal hyperplanes (MMH) or widest street [28] between separable areas among infinite number of separating lines, in order to minimize the classification error. The area between two separating lines is called plane and the term hyperplane is used to refer the ideal decision boundary [29]. Hyperplane can be written as W.X + b = 0 where W is weight vector, X is training tuple and b is the offset parameter as shown in Figure 1.
Considering b as an additional weight, the points that are above and under the separating plane satisfy where y i is +1 if the point is above the hyperplane and -1 when it falls under the hyperplane. Using Lagrangian formulation the MMH is written as a decision boundary.
where X i is support vector, X T is the test tuple and y i is the class label of i th support vector. Given the test tuple X T , the output tells in which part of the plane the test instance falls. If the result is a black point, then it falls on the top of plane and belongs to the black class and in case of being a white, it falls under hyperplane and belongs to white class. When instances aren't linearly separable, SVM transforms data to a new dimension in which instances related to each class will be separable linearly. Rather than the size of data, the complexity of the learned class is characterized by support vectors. Hence, SVM is less prone to over-fitting problem [29].
Several investigations show that in competition among sophisticated algorithms -like neural network and support vector machine -and simple ones -for example, k-nearest neighbors -, the clear winner isn't necessarily the more elaborate one [30,31]. In this study we intend to classify emotions using SVM and K-NN, so as to find which algorithm performs better in emotion classification.

METHODS AND MATERIALS 3.1. Participants and environmental protocol
Nineteen healthy college/university students aged 19-32 years old (14 males, 5 females) participated in this experiment of EEG recording. The EEG data from 11 subjects were excluded from analysis later due to presence of excessive noise or major movement artifacts -such as touching forehead and/or EEG equipment (6 subjects)-or because the subject reported that the movies were already watched by them before and thus fear emotion was not elicited in them at all (5 subjects). Before starting the experiment, all the participants were asked to sign a consent form after careful reading and then they were requested to sit still. Also, in order to avoid bias, no technical or general information was given to the participants about the experiment. This is different from some other investigations where they have been asked to focus on each corresponding emotion to a picture or movie clip, or the feeding behaviors are controlled 24 hours before data acquisition [33,14]. Our final cohort comprised of 8 subjects (6 males and 2 females). The experiment was carried out in a dark small room using an LCD monitor placed 50 cm away from the subject. The place was devoid of any noise or interruption. The subjects were requested not to stand up and not to remove the EEG headset immediately after finishing the experiment. At the end, subjects themselves evaluated their level of fear for horror movie and their general relaxation state during the relaxing movie, by giving a number between 0-10 where 0 means no fear at all and 10 means highest level of fear.

Stimuli
Acquiring meaningful data is critical in any signal processing application. However, acquiring data corresponding to a particular emotional state is challenging due to the subjective nature of emotions as well as cognitive dependency of physiological signals that requires a specific emotion must be elicited internally in participants [34]. Thus, in order to stimulate fear and relaxation emotional states in subjects, a set of combined movie clips with a total duration of 288 sec was used without any break time during the experiment. This experiment was consistent with few previous studies where video clips were used as emotional stimuli [35][36][37].
As shown in Table 1, the video shown to the subjects consists of 2 movie clips which elicit two target emotional states such as positive (relaxation) and negative (fear). The selection criteria for movie clips are: (a) the movie needs to be easily understood without explanation; and (b) both movies have to elicit single desired target emotion. Before starting the EEG recording and data acquisition process, several experimental arrangements were carried out in order to find the best situation for obtaining the best possible data. Through these experiments, we found out that during the first minute of test, users tend to have some movements, such as trying to touch the EEG device, strong head movements and rubbing eyes, that cause major artifacts. In order to avoid this phase, we decided to use a neutral movie during the first 90 seconds, which neither arouses negative nor positive emotion. The neutral movie is followed by relaxing (60 seconds) and horror movies (138 seconds). The process of our experiment is depicted in Figure 2.

EEG recording
One of the issues in EEG-related experiments is that the setup of the EEG recorder is not straight forward because of wet-electrodes. thus we have used a wireless EEG headset [38] from NeuroSky with a single dry-electrode attached to the forehead at position Fp1 as shown in Figure 3), a reference electrode on the ear clip and the sampling frequency was 512 Hz.

Methods
The recorded data were analyzed by step-by-step processes including feature smoothing, signal processing, feature extraction and finally classification for emotional state as depicted in Figure 4. One of the purposes of this study is also to evaluate the effect of the artifact removal algorithm on the classification stage. In order to do that, two types of training data were collected: (1) artifactual and (2)  ) for correction of the stereotyped artifacts including eye blinks, eye movements, muscle artifacts and line noise, by denoising with the help of Stationary Wavelet Transform (SWT). A 6-level decomposition of SWT (with Haar as mother/basis wavelet) was applied on the recorded raw signal. The output of SWT is a set of detail (d1-d6) and final approximate coefficients (a6), representing non-overlapping high and low frequency bands respectively as shown in Table2. A modified universal threshold proposed by [40] was then applied on different decomposition levels of coefficients in order to separate potential artifacts from actual EEG rhythms. During denoising, the threshold value and threshold function are carefully selected according to the work by Islam et al. [41].
The threshold value is calculated by below equation followed by the expression of garrote threshold function 46 as given below: Threshold value at level i, T i = K * i * sqrt(2 * ln(N )); N is the length of data, α i is the estimated noise variance for W i which is the wavelet coefficients at the i th decomposition level (W i = a i for approximation coefficient and W i = a i for detail coefficients) Threshold function, Where δ i is the garrote threshold function at each decomposition level of i and x is the signal value of the wavelet coefficients. During denoising, the choice of threshold parameter K is very critical in order to make sure that no distortion in the desired signal of interest [41] occurs. This is because the distinct frequency bands represent different EEG rhythms and some coefficients are more likely to contain artifacts than others. Finally, the artifact-reduced EEG signal is reconstructed back using the new set of wavelet coefficients after application of inverse SWT. The complete process flow of artifact removal is shown in Figure 5. the mean signal power was calculated for each frequency band during each of the time-segments. The selfreported forms from the subjects were used to exclude data related to the parts of horror movie that didn't carry any fear emotion. Finally, 46 features, as the mean power of frequency bands between 4 and 50 Hz during each time-segment, containing 689 instances (388 for scary and 301 for relaxing movie) for each subject, labeled with either 'Scary' or 'Relaxing', formed two training data-sets (artifact-free and artifactual).

EXPERIMENTAL RESULTS AND DISCUSSIONS
K-nearest neighbors with Euclidean distance was used for emotions classification task. In order to avoid the over-fitting problem, a 10-folds cross-validation was applied on the datasets. In k-fold crossvalidation, the dataset is segmented into K equal portions and then through k iterations, each time one of the folds is used as validation set and the k-1 remaining folds are used for training. We tested the classification rate for k ranges between k = 1 and k = √ n where n is the number of instances in the dataset [42]. We also chose the K parameter to be odd numbers in order to avoiding ties [43]. We compared the results and found k = 1 as the best K-NN parameter for both datasets with and without using artifact removal algorithm as shown in Table 4 and Table 5). For 7 subjects, applying SWT algorithm improved the classification rates between 1.32 % and 9.43%. Also, for 1 subject an insignificant decrease in classification rate (-0.58%) is seen. However, the datasets are unbalanced (388 instances of Scary movie and 301 instances of relaxing movie), but observing the confusion matrix shows an almost balanced state of wrongly classified instances for both 'Scary' and 'Relaxing' classes. Based on confusion matrix, we may calculate the true positive (TP) and false positive(FP) rates, as well. We show the results in details for both datasets -artifact-free and artifactual data -for K-NN algorithm since it gives the best classification result. For SVM the results are shown just for artifact-free dataset, however using SWT increased the classification rate in SVM, as well.  The best result was obtained for k = 1, however, we checked the results for all K's between k = 1 and k = (n) 2 weighted by 1/distance. In Tables 6,7,8,9,10 and table 11 the results are shown for k = 3, k = 5 and k = 7.      For k = 1, k = 3, k = 5 and k = 7, using the SWT caused having better average classification accuracies of +3.06%, +4.771%, +3.983% and +3.261%, respectively. For all the subjects except subject1, the classification rate after artifact removal was raised. Also we had the best result for subject7 with 11.3% improvement (for k=7) in classification rate as shown in Figure 6, Figure 7, Figure 8 and Figure 9. . The classification rate for k=7 before and after using SWT for artifact removal [39] In order to find the best power spectrum feature for emotion classification, we repeated the classification task for each of the features mentioned in Table 3 for all the subjects, using the artifact-free datasets, when k = 1. As it is shown in Table 12, the best accuracy was obtained when a combination of all power spectrum features were engaged in emotion classification. Furthermore, the worst and the best average classification accuracies were obtained by Theta and Gamma bands with 65.368% and 94.208%, respectively. This finding is consistent with the result of the other studies in which it was mentioned that high frequency bands have a major role in emotional activities [18,20]. In order to get the classification accuracy using SVM, the Pearson VII function-based universal kernel [44] was used along with 10-folds cross-validation and the cost parameters C : C 10 −5 , 10 −4 , ..., 10 2 . Fear and relaxation were classified with 90.816% average classification rate as shown in Table 13.
The classification rate of K-NN, for K = 1, is higher than SVM, although, we don't claim that using exact nearest neighbors is the best classifier for emotion classification. This is because there is a main point of concern about the laziness of K-NN and its performance when the size -or the dimensionality -of training dataset is big. One way to overcome the so-called laziness problem can be using another type of nearest neighbor algorithm called approximate nearest neighbors (ANN) like locality sensitive hashing (LSH), in order to reduce the search space and consequently, having a fast instance-based classifier [45].

CONCLUSION
In this paper, we investigated the characteristic of EEG dynamics for classification of relaxation and fear emotions. We conducted a set of experiments by designing 3 joint video clips with a total duration of 288 seconds. The participants neither were informed about the characteristic of the experiment, nor we asked them to focus on a specific mental task. Also, we didn't put any mental or physical limitations during the data acquisition. Experiments were carried out without interruption between each film. In the end, the data of 8 subjects, including 2 females and 6 males were collected. The short time Fourier transform was applied over raw EEG data in order to find the distribution of power for each frequency bands. The stationary wavelet transform was used for artifact removal. Finally, two datasets (artifactual and artifact-free) were obtained from each subject in order to be classified by K-NN and SVM. The best result were obtained by K-NN when K = 1. Using SWT increased the average classification rate for both K-NN and SVM. Using k-nearest neighbors as instance-based classification algorithm, we managed to classify the relaxation and fear emotions with average accuracy of 94.208%, when K = 1. Also, to find the best EEG feature for classifying fear and relaxation, we compared the classification results for power spectrum of differential asymmetry features and the best results were obtained using Gamma band, however, we had even a better result by using the combination of all EEG frequency bands between 4 to 50 HZ. The classification accuracy also estimated for SVM and it achieved an average classification accuracy of 90.816%.
The future development of this study will be focused on carrying out this experiment with more participants and more trials. Also, we would like to use other classification algorithms and to compare the results with those ones obtained results by SVM so as to find the most adequate classifier.