http://ijece.iaescore.com Online multiclass EEG feature extraction and recognition using modified convolutional neural network

Info 2021 Many techniques have been introduced to improve both brain-computer interface (BCI) steps: feature extraction and classification. One of the emerging trends in this field is the implementation of deep learning algorithms. There is a limited number of studies that investigated the application of deep learning techniques in electroencephalography (EEG) feature extraction and classification. This work is intended to apply deep learning for both stages: feature extraction and classification. This paper proposes a modified convolutional neural network (CNN) feature extractor-classifier algorithm to recognize four different EEG motor imagery (MI). In addition, a four-class linear discriminant analysis (LDR) classifier model was built and compared to the proposed CNN model. The paper showed very good results with 92.8% accuracy for one EEG four-class MI set and 85.7% for another set. The results showed that the proposed CNN model outperforms multi-class linear discriminant analysis with an accuracy increase of 28.6% and 17.9% for both MI sets, respectively. Moreover, it has been shown that majority voting for five repetitions introduced an accuracy advantage of 15% and 17.2% for both EEG sets, compared with single trials. This confirms that increasing the number of trials for the same MI gesture improves the recognition

However, the proposed model was a binary classifier. Djamal et al. [21] implemented combined wavelet transformation and CNN to identify a stroke vs. no stroke occurrence (a binary classifier) with 90% accuracy. Ali et al. [22] used deep learning to diagnose autism spectrum disorder, reaching 80% accuracy. Djamal et al. [23] used combined wavelet transformation and RNN. The wavelet transform is used to extract time-frequency features and the RNN is used as the classifier. The goal was to classify four direction movement of a drone besides the focus vs. non focus status. The obtained accuracy was 79.6%.
Schirrmeister et al. [24] made an intensive study of deep convolutional neural networks with different structures, and showed that the deep learning classification performed as good as Filter Banks common spatial pattern (FBCSP) in terms of accuracy (84% vs 82.1%). Tabar and Halici [25] implemented a combined system of CNN and a stacked autoencoder and then used time, frequency and location information to classify a two-class EEG dataset of the BCI competition IV. The dataset was obtained by capturing Electrocorticography (ECoG) signals during the flexion of five fingers and reached accuracy of 90% because ECoG has higher Signal to noise ratio than EEG. Asim et al. [26] used a combined machine learning feature extraction and SVM classifier to detect Alzheimer (binary classification) and achieved 94%. Ajaj et al. [27] implemented short time fourier transform (STFT) to reduce the size of data entering the neural network. They reached 99.8% accuracy using weighted KNN approach. However, the tested approach was a two-class problem (focus vs non focus). Niroshana et al. [28] used fine-grained segment in multi scaled entropy along with CNN to determine the sleeping status of the subject. The classification problem was to identify the sleeping stage. The proposed method reached an average accuracy of 92.2%. Doborjeh et al. [29] used spiking neural networks (SNN) to classify the attention of customers to a stimuli, with 89.9% as a binary classification problem. Hajinoroozi et al. [30] used a novel channel-wise CNN to detect the drivers attention status, reaching 84.4% accuracy. Hussein et al. [31] used LSTM along with fully connected NN to detect the occurrence of epileptic seizers, reaching 100% accuracy. Ieracitano et al. [32] converted the power spectral densities (PSD) of the 19-channel EEG signals into a gray scale image and then applied CNN method to classify the severity stage of Dementia. They reached 89.8% accuracy in binary classification and 83.3% accuracy in a three-degree classification.
Deep learning techniques aim to extract high level features from a large time samples size. Those features cannot be extracted directly from an input matrix with high number of samples. In addition, the classification task mainly relies on the accuracy of those features. Usually, deep learning is used in image classification and speech recognition. The CNN is one major branch of deep learning techniques, which has been widely used in image processing. However, it has only been recently introduced to the field of BCI using EEG. Therefore, this paper proposes a modified CNN model to extract the EEG features and then feed these features to a neuro-classifier to classify unknown multi-class EEG gestures. This paper introduces a modified CNN feature extraction and classification algorithm able to classify 4 different EEG gestures. This work used online captured EEG as the data source. This guarantees that the proposed model will integrate efficiently with BCI applications, especially high demanding real time applications. This paper is organized as follows. In section 2.1, the proposed modified CNN feature extractor structure is explained. Then, in section 2.2, the shallow neural network classifier structure is described. For the purpose of performance comparison with EEG classification algorithms that solve more than two classes, the multi class LDA is revised in section 2.3. In section 3, the EEG session recording process is described. In addition, the CNN classification results are discussed and compared to multi-class LDA algorithm.

RESEARCH METHOD 2.1. Proposed CNN model
As mentioned in section I, the DL has been used in image processing and classification. In brief, the CNN is composed of the following layers [19]: − Convolution layer(s) − Pooling layer(s) − Shallow neural network layer(s) The convolutional layer (CL) uses a kernel filter, which is convolved with the original array elements using the dot product operation. In computer vision, the data elements represent the pixels of the image, which is a two-dimensional array (for each Red, Green, and Blue channels), while the kernel filter is 3 by 3 or 5 by 5 [32]. The function of the 2-D kernel filter is to detect both the edges as well as the points of changes in the image domain. This is done by applying different Kernel filter types, such as averaging, root mean square (RMS) and sharpen filter. Each of these Kernel filters has one goal: to emphasize the differences in adjacent pixels and Hence to extract a reduced-size set of features for an image. However, in EEG, the structure of data is different: multi-channel readings are composed of T trials by M rows × N columns [33], where T is the number of EEG trials (for the same EEG gesture), M is the number of channels and N is the  (1): where xi is the i th time sample and n is the total number of samples for which the mean is calculated. As shown in (1) is applied to a window of size WC. The filter is then shifted by a step size (stp). Therefore, the final size of the convolved EEG input is given by: where WC is the window size (in samples). In this paper, WC is set to 128 samples. For a sampling rate of 128, this window is equivalent to 1 second. This time window is found (through experimental trials) to be sufficient to capture the EEG activity feature for a single channel. The output size of this layer is: The second layer, Pooling layer_1, extract the RMS value of the elements of each row within a window size WP = 64, according to (3): where xci is the i th element in the output of the convolution layer, n is the selected window size, which is 64. The size of samples is reduced by a factor of 64 and the output size of the Pooling layer can be formulated as: where N2 is the column size of the convolutional layer and Wp is the pooling layer window size. Therefore, the output size of Pooling_layer_1 is 5×14×2. In the next layer, Pooling layer 2, the mean value of a window size of each two samples is calculated. The final feature matrix size for each class is 5×14, which is converted to 1-D vector of 70 samples. The complete structure of the deep learning layers is shown in Figure 1. The output of the Pooling layer_2 is fed to a shallow neural network as a training set. Each set is a vector of 70 element features and the corresponding output labels are (0, 1, 2 and 3) for Class0, Class1, Class2 and Class3, respectively. The NN layers architecture is described in the next section. The hidden size might be reduced after weight updates, according to the eliminated weights. The nodes of the hidden layer are fully connected to both the input layer and to the output layer nodes, respectively, with random initial weights. Because the desired output is either 0 or 1, each neuron has a unipolar activation function, log sigmoid, defined as: Training phase: The neural network weights update is based on back propagation (BP) with momentum term algorithm, where the weights are updated based on the error rate in the previous iteration (Epoch). The weights of the connections of the nodes of each layer are calculated as [34]: where η is the learning rate, σ is the error signal vector of a layer and is the transposed input vector to that node. is the rate at which the previous weight (Wn-1) contributes to the current weight (Wn). To choose the best values for and , the learning phase is repeated for different values and the convergence time is observed. The learning error rate is calculated using the mean square error between the desired output and the NN classifier output. Then, upon evaluating the learning rate error, the following values are chosen based on best error performance and shortest convergence time: =0.3, =0.6 and error_goal=10 -2 In the training phase, NN weights are updated at the end of each epoch until the error reaches the minimum stopping condition or a maximum of 1000 epochs are occurred. Afterwards, the trained neural network weights are saved and used in the next stage, classification. In this stage, unknown-class time features (56 elements) extracted from the deep learning stage is applied to the trained NN to get the desired classification result.
Recognition phase: After the NN is trained with the training set, the final weights are stored so that no training is needed for the new inputs (unknown class). Then, the unknown-class data set is fed as an input to the first layer (convolutional layer) and the same process is repeated (feature extracted using deep learning). The final output is a feature vector of size 70×1. Each trial feature vector belongs to a predefined class. Therefore, the trails are fed to the trained NN (within a loop) and the output of each loop represents the classification result, rounded to either 0 or 1. Since the NN is a binary classifier, a binary-tree classification strategy is proposed because the recognition task has more than two classes. The binary-tree classifier is depicted in Figure 2.

Multi class LDA classifier
For the purpose of comparison with the proposed CNN model, the binary LDA is revised and then multi-class LDA classifier model is derived according to [34]. In classification problems, the samples are normally non-linearly separable. Therefore, a projection on other space is needed. The LDA calculates the projection vector according to the generalized Rayleigh quotient [35]: where SB is the between-class covariance array and SW is the within-class covariance array. To solve for a solution of that maximizes J(w), Fisher Linear Discriminant is used: where m1 and m0 are the mean for class1 and class0, respectively. The binary LDA used with two-class EEG shows good accuracy of 90% [36]. The LDA can be extended to be used in m-class problems, where the classk is selected based on the minimum Euclidean distance: where is the unknown class feature vector, is the projection vector and is the mean of classk.

RESULTS AND DISCUSSION
EMOTIV EPOC+headset is used to collect EEG raw EEG signals. The headset supports 14 channels, each with a sampling rate of 128 samples/sec [37]. The raw data is recorded using MATLAB 2019B. The output data from the headset is already filtered with a built-in digital 5 th order sinc bandpassfiltered (0.16~43) Hz, with two digital notch filters at 50 Hz and 60 Hz. Each of the proposed classifiers (CNN and LDA) operates in two phases: The training phase: The test subject is asked to rest his arms and all body for 15 minutes before recording session begins. A cue on the screen is shown to the subject to indicate the starting of record. Each EEG trial record time is 5 seconds. During the trial capturing period, the subject will perform an EEG gesture according to Class0 (Rest). A stopping cue is shown to indicate the ending of trial recording. The subject then repeats the same class gesture for the remaining four trials using the same procedure. A total of five trials for the same class is recorded. These trials are used as a training data for Class0, which is fed to the classifying algorithm. The same recording procedure applies to the remaining three classes: Class1, Class2 and Class3. The recognition phase: same as training phase, except that the subject picks one of the four gestures on a random basis, for three repetitions only. The recorded data is fed to the classifying algorithm as an unknownclass gesture. The EEG training and recognition trials recording procedure are summarized in Table 2. To calculate the classification accuracy, the subject selects a set of random actions. A total of 100 trials are tested. The performance accuracy of the classification algorithm is calculated as: Figure 3 shows the proposed CNN model classification accuracy, calculated for each set of EEG gestures: Arm rest, both eyes blinking, Left eye blink and Right eye blink. Each class has 4 trial sets. Each set is classified while the other three sets are used as training sets. Within the same set, the classification accuracy is calculated in three options: single, three and five trials. The class / trials with the highest accuracy are shown. Finally, the average accuracy is calculated among all the four classes with the best set accuracy is chosen. This accuracy is calculated for single, majority 3 and majority 5 repetitions, respectively. The same description applies to Figure 4, using multi-class LDA.
A comparison between Figures 3 and 4 shows that CNN algorithm outperforms the LDA for the same set of EEG gestures with a 27.1% enhancement in average accuracy. To validate the performance of the proposed CNN model, another MI motion set is chosen (arm rest, eye brows up and down, blinking both eyes and both hands opening and closing). It is shown that CNN still performs better than multi-class LDA with a margin of 15.55%, as shown in Figures 5 and 6, respectively. This complies with the fact that LDA is best suited for binary classification problems which have two classes, while CNN can be efficiently adapted and learned to multi-class BCI applications.  In addition, increasing the number of trial repetitions for the same gesture significantly increase the classification accuracy. For example, in Figure 3, there is an average accuracy increase of 15% when five gesture repetitions are used against single repetition. The majority voting [20], [38] is used to decide for the class with the most frequent occurrence in each 5 consecutive trials. Therefore, there is a very good accuracy advantage when using repeated gestures combined with majority voting. For example, for the following output vector: The winning output class is 2, accroding to majority voting rule. When comparing the individual class accuracies of Figures 3 and 4, the right eye EEG gestures has the highest accuracy compared to other gestures. This is caused by the fact that the test subject EEG signal related to the right eye is stronger than other EEG related to other gestures. This leads to a better feature vector and consequently, more accurate classification. In addition, in Figure 5, both eyes blinking has the highest average accuracy. The average computation time is calculated using the MATLAB command (CPU time), which calculates the total CPU time elapsed by the algorithm code. The average computation time for the CNN classifier is 7.9 seconds, compared to 0.35 seconds for LDA. Although the LDA classifier has less computation time, the CNN classifier provides better accuracy for the same EEG data set. Table 3 shows that the proposed algorithm outperforms other solutions proposed by the most related works in the field, for which the same number of classes used and CNN technique is implemented.  Table 4 compares the average accuracies of both the proposed CNN model and multi-class LDA, using two MI sets. It shows that CNN performance is better than LDA for the same set and the same number of trials. This table also indicates that the performance using MI set1 is better than set2.

CONCLUSION
In this paper, a multi-class EEG convolutional neural network classification algorithm for motor imagery is proposed. This classifier is adopted from image classification into the EEG recognition field. Major modifications are done so that the classifier can fit the stochastic nature of EEG signals. This work has used EMOTIV EPOC headset to capture online EEG data, instead of using online available EEG datasets. This is an important feature that makes this model satisfy online BCI applications, such as moving a disabled wheeled chair or manipulating an artificial arm. In addition, this model achieved small computation time, which is an advantage in real-time applications. It could be further reduced using more powerful on-board processing units. This algorithm is capable of labeling four different classes with very good average accuracy of 92.8% for one EEG set and 85.7% for another set. This result outperforms many similar proposed DL techniques, in terms of accuracy and number of classes. In addition, a four-class LDA classifier model is implemented based on extension of binary LDA and the accuracy is compared to the proposed. The results show that the CNN classifier outperforms the LDA with a margin of 27.1% and 15.5% for set 1 and set 2, respectively. In addition, it has been proven that majority voting combined with repeated trails significantly improves the classification accuracy. The improvement is 15% and 17.2% for the 2 sets, when using majority-5 voting against single trial.
In addition, is has been found that certain EEG MI gestures provide more individual classification accuracy than others. For example, right eye blinking (for set 1) and both eyes blinking (set 2) gives higher recognition rate than other gestures. This outcome corresponds with the test subject EEG nature and might differ from other subjects. Therefore, a future investigation on more subjects is required to determine the general selection rule for the best EEG gesture. Although the CNN classifier model computation time is larger than that of LDA, CNN classifier model has better classification accuracy for the same EEG set, which makes it better suited for high accurate BCI applications demands. The importance of a multi-class EEG classifier over binary classifier is that it allows the implementation of variant tasks of BCI applications. For example, the algorithm presented in this work can drive a robot into four directions: forward, backward, left, and right.
The proposed algorithm, like other BCI-EEG classification methods, suffers from performance degradation when the number of classes increases. This can be solved by using combined time-frequency features of selected EEG channels (channels with maximum entropy values could be used). Those features are then fed to the CNN algorithm described earlier with minor modifications. Another limitation of the algorithm is that the training EEG data for each class must be recorded and stored in advance. This must be done only once for each subject. If a new person wants to use the same EEG-MI system, the training operation must be repeated. However, the algorithm still capable of classifying new EEG data for the same subject as long as the same user is using the trained BCI system. As a future improvement to the current work, common spatial pattern (CSP) can be used to select the channels with maximum signal to noise ratio. This could enhance the overall accuracy and allows to increase the number of recognized classes. In addition, channel mutual information could be used as another criteria for channel selection. Furthermore, combined time-frequency features can be extracted and use the same proposed CNN structure with minor modifications. Wavelet transform can also be used in this context. Then the channels with the highest mutual information can be selected and neglect other channels that has noise-like contribution.