A Novel Use of Discrete Wavelet Transform Features in the Prediction of Epileptic Seizures from EEG Data
Cyrille Feudjio, Victoire Djimna Noyum, Younous Perieukeu Mofendjou, Rockefeller, Ernest Fokoué
AA Novel Use of Discrete Wavelet Transform Features in thePrediction of Epileptic Seizures from EEG Data ⋆ Cyrille Feudjio a , ∗ , Victoire Djimna Noyum a , Younous Perieukeu Mofendjou a , Rockefeller b andErnest Fokoué c , ∗ a School of Mathematical Sciences, African Institute for Mathematical Sciences, Crystal Gardens, Limbe Cameroon b School of Mathematical Sciences, Stellenbosch University, South Africa c School of Mathematical Sciences, Rochester Institute of Technology, Rochester, NY 14623
A R T I C L E I N F O
Keywords :Feature ExtractionDWTMFCCEEG signalsEpileptic seizures.
Abstract
This paper demonstrates the predictive superiority of discrete wavelet transform (DWT) overpreviously used methods of feature extraction in the diagnosis of epileptic seizures from EEGdata. Classification accuracy, specificity, and sensitivity are used as evaluation metrics. Wespecifically show the immense potential of 2 combinations (DWT-db4 combined with SVMand DWT-db2 combined with RF) as compared to others when it comes to diagnosing epilepticseizures either in the balanced or the imbalanced dataset. The results also highlight that MFCCperforms less than all the DWT used in this study and that, The mean-differences are statisticallysignificant respectively in the imbalanced and balanced dataset. Finally, either in the balanced orthe imbalanced dataset, the feature extraction techniques, the models, and the interaction betweenthem have a statistically significant effect on the classification accuracy.
1. Introduction
Nowadays, people face various kinds of stress in their daily lives and most of the people around the world sufferfrom a range of neurological disorders. Epilepsy affects up to 1% of the population, making it, alarmingly, the thirdcommonest encephalopathy (Usman et al., 2017). It can affect both males and females of all races, ethnic, backgrounds,and ages.Approximately 50 million people worldwide suffer from epilepsy and 90% of them are from developing coun-tries(Kandar et al., 2012). It is not one disorder but rather a syndrome with widely divergent symptoms involvingepisodic abnormal electrical activity within the brain. Patients with epilepsy could be treated with medication orsurgical procedures (Guenot, 2004). However, these methods are not fully effective. Unfortunately, seizures that can’tbe fully treated medically limit the patient’s active life, and in these cases, patients cannot work independently andperform certain activities. This ends up in the social isolation of people and economic difficulties. However, earlyprediction of epileptic seizures can ensure sufficient time before they occur.Tons of effort has been put in situ by researchers and institutions to modify that.But interestingly, the main causeof it remains a mystery. Only an early diagnosis pauses as a secure and plausible way to treat it. Therefore, severalmethods are developed to detect an epileptic seizure before it starts. Machine learning models are used for this taskwhich incorporates Electroencephalography (EEG) signal acquisition and preprocessing, features extraction from thesignals, and finally, classification between different seizure states. Electroencephalography (EEG) may be a usefulmethod to watch the nonlinear electrical function of the brain’s nerve cells; hence, it is a valuable tool for the epilepsyevaluation and treatment(Wang et al., 2013).Feature extraction which involves tidying the data is usually said to represent where 80% of the time is spentworking on a data science. In this case, for instance, Preprocessing and feature extraction from EEG signals have an ⋆ This document is the results of the research project funded by AIMS CAMEROON with the help of Mastercard Foundation.In this work, we demonstrate the predictive superiority of discrete wavelet transform over previously used methods of feature extraction in thediagnosis epileptic seizures from EEG data. ∗ Corresponding author [email protected] (C. Feudjio); [email protected] (V.D. Noyum); [email protected] (Y.P. Mofendjou); [email protected] ( Rockefeller); [email protected] (E. Fokoué)
ORCID (s): (C. Feudjio)
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 1 of 28 a r X i v : . [ c s . C E ] J a n xcellent effect on maximizing prediction time. Literature (Rasekhi et al., 2013),(Teixeira et al., 2014),(Bandarabadiet al., 2015),(Zandi et al., 2013) states that no machine learning model provides a reliable method for pre-processing andfeature extraction. However, each of these processes is tailored to specific problems, which makes them indispensablebefore building the model. Therefore, this project aims to look at the predictive effect of feature extraction methodswithin the EEG dataset especially in the case of an epileptic seizure. Some researchers conducted studies to detect the phase of seizures through EEG signal processing as reported withinthe study of (Ullah et al., 2018). During this study, the classification of the seizure phase was called pre-ictal, ictal,inter-ictal, and postictal to analyze differences in characteristics in each phase. The tactic used for signal processingwas done in the time, frequency or time-frequency domains. Furthermore, another important study was early detectionof the phase before seizures to supply an alarm to epilepsy patients as reported within the study of (Saputro et al.,2019). During this research, they detected the kind of seizures as opportunities and challenges to help the neurologist inclassifying the seizure from EEG recording.(Golmohammadi et al., 2017) conducted an epileptic EEG signal processing simulation to differentiate the types ofseizures. The seizure types studied during this research were a generalized non-specific seizure, non-specific seizure,and tonic-clonic seizure. The methods utilized during this research were Mel Frequency Cepstral Coefficients (MFCC),Hjorth Descriptor, and Independent Component Analysis (ICA) as features extractions while Support Vector Machine(SVM) was used as a classifier.(Mursalin et al., 2017) presented a hybrid approach where features from time and frequency domains were analyzedto detect epileptic seizures from EEG signals. They started by applying an Improved Correlation-based FeatureSelection (ICFS) method to capture relevant features from the time domain, frequency domain, and entropy-basedfeatures. Then, the classification of the selected feature was done by an ensemble of Random Forest (RF) classifiers.Results revealed that the proposed method was better in performance as compared to the conventional correlation-basedand some other state-of-the-art methods of epileptic seizure detection.An automatic epilepsy diagnosis framework based on the combination of multi-domain feature extraction andnonlinear analysis of EEG signals was proposed by (Wang et al., 2017). EEG signals were pre-processed by using thewavelet threshold method to remove the artifacts, and representative features within the time domain, frequency domain,time-frequency domain, and nonlinear analysis features was extracted. The optimal combination of the extractedfeatures was identified and evaluated via different classifiers. Experimental results demonstrated that, the proposedepileptic seizure detection method can achieve a high average accuracy of 99.25%.keeping in mind the fact that preprocessing of the EEG signals can improve prediction sensitivity and averageanticipation time, (Saputro et al., 2019) proposed an efficient machine learning method for epilepsy prediction. In theirresearch, they classified three sorts of seizure; Generalized Non-Specific Seizure (GNSZ), Focal Non-Specific Seizure(FNSZ), and Tonic-Clonic Seizure (TCSZ). They used the mixture of three feature extraction methods, Mel FrequencyCepstral Coefficients (MFCC), Hjorth Descriptor and Independent Component Analysis (ICA). The most effectiveresult was obtained by combining MFCC and Hjorth descriptors which detected seizure type with 91.4% as averageaccuracy.(Paul, 2018) proposed a method for automatic seizure detection based on the mean and minimum value of energy.The algorithm was tested on the CHB-MIT database on three subjects with 60 and 40% of data used as training and testdata, respectively. They obtained an average detection accuracy of 99.81%(Bandarabadi et al., 2015) proposed an algorithm to predict epilepsy seizures which can extend the lifetime ofepilepsy-affected patients. they extracted spectral power features, and after an appropriate selection, these featureswere passed into Support Vector Machines for classification. They observed sensitivity of 75.8% and concluded that,reducing the proposed features subset can improve seizure prediction performance.(Teixeira et al., 2014) proposed a model for the prediction of epileptic seizures by choosing six channels of EEGsignals and extracted 22 linear univariate features for each channel. They tested their model for prediction by varyingmultiple combinations of electrodes and also with four different pre-ictal state durations. They used three classifiersand approximately predicted every seizure. After selecting suitable features, training data was fed into Support VectorMachine for training, then test data was passed for determining classification accuracy and sensitivity. They observed75.8% as sensitivity of detecting the seizure.Many researchers used EEG signals to detect the beginning of the pre-ictal state of epilepsy. However, only a few
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 2 of 28 ave reliably detected it .(Rasekhi et al., 2013) proposed an algorithm for seizure prediction with the help of univariatelinear features. They used six EEG channels in their proposed model and extracted 22 univariate linear properties.Support Vector Machine was used as a classifier to classify the preictal and ictal states of EEG signals. On the average ,the prediction sensitivity after applying this algorithm was 73.90%.(Gadhoumi et al., 2012) used a wavelet method for the prediction of seizures. They extracted features includingwavelet energy and wavelet entropy. Two or three channels were selected for testing purposes on a dataset of sixpatients. Sensitivity was reported as 88% with a mean anticipation time of twenty-two minutes.(Bhople and Tijare, 2012) proposed an epileptic seizure detection method by using a Fast Fourier Transform(FFT). The FFT-based features were extracted and were fed to the neural networks. Multilayer perceptron (MLP) andGeneralized Feed-Forward Neural Network (GFFNN) were used as a classifier. The algorithm was tested on the Bonndatabase, and results show they were able to achieve 100% accuracy.(Acharya et al., 2012) designed a method for the detection of three states of EEG signal, (normal, pre-ictal, and ictalconditions) from recorded EEG signals. They combined the features from two domains; time domain, and frequencydomain, and found that this combined features method is performing well in situations where the signal has a nonlinearand non-stationary nature.
Early detection is an important step in assisting people with epilepsy to take preventive measures against theupcoming manifestation of the disease/disorder, such as finding a secure place before the seizures occur. Classificationof seizure could be a significant milestone in the journey through a potential or proper treatment, and if possible,prognosis prediction. In this regard, several automatic methods for detecting epileptic activity have been proposedrecently (Wang et al., 2017). Most of them use Fourier Spectral Analysis for the extraction of EEG signals underthe assumption that EEG signals are stationary (Polat and Güne¸s, 2007), allowing the transformation of signals fromthe time domain to the frequency domain. Also, wavelet transformation approaches for time-frequency estimationare generally interesting. For instance, the Discrete Wavelet Transform (DWT) method which is a classical methodof time-frequency analysis similar to the Short-Term Fourier Transform has been used to extract features from EEGsignals (Acharya et al., 2012). In addition to the extraction of time-frequency characteristics, non-linear analysis ofEEG signals have also received considerable attention for detecting seizures which can be considered as a transitionof the human brain (Gajic et al., 2015). We also have several discrete wavelet transformations using multi-domaincharacteristics and non-linear analysis to improve the performance of EEG seizure detection. Besides that, otherfeature extraction methods such as Mel frequency cepstral coefficients (MFCC), Hjorth descriptor, and IndependentComponent Analysis (ICA) could also be genuinely used. All of these methods are designed to remove redundant andirrelevant features so that the classification of new instances becomes more accurate. Researchers continue to explorethese methods because the accuracy or sensitivity of classification models is highly dependent on the features used forprediction. Therefore, our contribution is to establish or at least intelligently speculate on the predictive powers of thefeature extraction methods used. Tackling this will help us to:• Learn about different Machine Learning methods that can be combined with the feature extraction process andinterpret the outcomes to build a kind of hybrid model that could hopefully generalize well.• Potentially build a whole method that works well on the EEG dataset and could be extended to other domainswhere time series or wave signals are used.• In the long run, build a package to make the whole process (feature extractions, fitting, and evaluating models)easy to use through dialogue boxes; both for medical purposes and social good.Contextually, the focus throughout the study will be on MFCC and the 3 best Wavelets as feature extraction methods.
2. Feature Extraction Techniques
Feature extraction techniques are methods that select and/or combine variables into relevant features, effectivelyreducing the quantity of information that has got to be processed, while still accurately and completely describing theoriginal dataset. In this chapter, we present two features extraction techniques used on the EEG dataset namely WaveletTransform and Mel Frequency Cepstral Coefficient(MFCC).
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 3 of 28 .1. Wavelet Transforms
A wavelet 𝜓 ( 𝑡 ) is a small wave, which must be oscillatory in some way to discriminate between different frequen-cies (Merry, 2005). It allows complex information content to be decomposed into elementary form at different positionsand scales, and subsequently reconstructed back again with high accuracy. Figure 1 shows some examples of a possiblewavelet. Figure 1:
Kind of wavelet functions
The use of Wavelet Transform can work continuously (CWT) or discrete (DWT).Given a time-domain signal function 𝑓 ( 𝑡 ) , and a wavelet function 𝜓 ( 𝑡 ) , the Continuous Wavelet Transform isdefined by equation (1). Ψ 𝜓𝑓 ( 𝜏, 𝑠 ) = 1 √| 𝑠 | ∫ +∞−∞ 𝑓 ( 𝑡 ) 𝜓 ∗ ( 𝑡 − 𝜏𝑠 ) 𝑑𝑡 , (1)Where 𝜏 and 𝑠 represent the translation and scale parameters respectively while 𝜓 ( 𝑡 ) is called the mother wavelet.The symbol * indicates that, in case of a complex wavelet, the complex conjugate is used. By discretizing theseparameters, the DWT is obtained (al Qerem et al., 2020).Several Wavelets Transform namely Discrete Wavelet Transforms (DWT), Discrete Fourier Transform (DFT),Single Valued Decomposition (SVD), Empirical Mode Decomposition (EMD), and their variants are widely used forseizure detection and prediction applications. Although many other time-frequency feature engineering approachesare prevailing for signal processing such as EMD, SVD, ICA, and PCA (al Qerem et al., 2020), DWT based waveletfeature analysis is identified as effective for time-frequency domain analysis because of its multiscale approximationfeature. Another highlight of DWT based feature engineering is that, it is employed for both signal noise reductionas pre-processing, and feature extraction. The main characteristic of DWT which makes it the most effective methodfor analysis of EEG signals is its resolution of frequency and time. This property leads to optimality status forfrequency-time resolution (Chen et al., 2017). However, there exist many families of DWT transform described below. There are many types of DWT, which are considered as mathematical and statistical functions. These types aredivided into families according to frequency components. Seven different types of common wavelets appearing in theliterature (John Martin et al., 2018). These are Discrete Meyer (dmey), Reverse biorthogonal (rbio), Biorthogonal(bior), Daubechies (db), Symlets (sym), Coiflets (coif), and Haar (Haar) (al Qerem et al., 2020).Four main factors have a direct impact on discrete wavelet transform (DWT) performance, summarized by: DWTcoefficient feature, mother wavelet, frequency band and, decomposition level. As mentioned in (Zhang et al., 2017),based on the classification accuracy and computational time, it was found that Coiflet of order 1(Coif1) is the bestwavelet family for analysis of EEG signal. According to (John Martin et al., 2018), this argument is being challengedby many researchers. Therefore they recommend Haar and, second and fourth-order Daubechies (db2, db4) waveletsfor signal preprocessing and feature extraction since these methods were provided better accuracy in the recentclassifications.The Figure 2 presents the different wavelets families and their associated mother wavelet.
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 4 of 28 igure 2:
Wavelets Families (al Qerem et al., 2020)
The methodology contains three steps which are described below. The coming subsection describes the threecritical steps in DWT.1.
Wavelet Threshold De-Noising :
Generally, Physiological signals are nonlinear and contaminated (Wang et al., 2017): it could be due to back-ground noise around the facilities, disposition of the electrodes, mobility of the patient during the recording, etc.Removing noise is, therefore, an important step. The wavelet threshold method can perform well in denoisingnon-stationary EEG signals (John Martin et al., 2018). The word “noise" is mentioned as a standard term used insignal processing, but in EEG signal processing noise is in the form of sharp waves that are not significant toidentify (John Martin et al., 2018). Thus, getting rid of some frequency bands appearing within the decomposedbands by the use of the wavelet threshold becomes a critical step to achieve to unveil the relevant features fromthe raw signal. The threshold is expressed as (Gilda and Slepian, 2019): 𝜆 = 𝜎 √ (2 log 𝑁 ) , (2)where 𝜆 is the wavelet threshold, 𝜎 is the standard deviation of the noise and 𝑁 is the length of the samplesignals, respectively.The denoised EEG signal will facilitate extraction of distinguishable features than original signal, especially forepileptic event detection (John Martin et al., 2018).2. Wavelet Decomposition
The process of Wavelet decomposition is described below:• The first EEG signal goes into a band-pass filter. Band-pass filter is a combination of both High-band-PassFilter (HPF) and Low-band-Pass Filter (LPF) to have the requested result. This process is categorized asthe first level, which incorporates two corresponding coefficients; one among them is Approximation (A)and another is Detailed (D).• The output of the Low-band-Pass Filter (LPF) goes into another band-pass filter and so on up to level 4 asshown in Figure 3.Note that, the process goes on under multiple levels as a subsequence of the coefficient from the first level withinthe approximation. At each process, the frequency resolution is doubled using the filters while decomposing andreducing the time complexity to half.
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 5 of 28 igure 3:
Four level EEG signal decomposition(Gilda and Slepian, 2019) Features Extraction :
Multi-Resolution Analysis (MRA) is used to extract feature vectors from signal data. Commonly, when DWT isused as a feature extraction method, the extracted features for classification include mean average value, standarddeviation, energy, and spectral entropy. Below are the formula to compute all these quantities (Ahammad et al.,2014).• The variance can be defined as a deviation of the signal from its mean. It is given as, 𝜎 = ∑ 𝑁𝑖 =1 ( 𝑥 𝑖 − 𝜇 ) 𝑁 (3)• The mean signal energy of a seizure data generally tends to be higher than that of normal data due to higheramplitudes. It is given as, 𝐸 = ∑ 𝑁𝑖 =1 𝑥 𝑖 𝑁 (4)• Power spectral density is calculated in two steps. First, by finding the fast Fourier transform 𝑋 ( 𝑤 𝑖 ) of thetime series and then taking the squared modulus of the FFT coefficients. 𝑃 ( 𝑤 𝑖 ) = | 𝑋 ( 𝑤 𝑖 ) | 𝑁 (5)From this, the maximum and minimum values are used.• Entropy is the measure of randomness and the information content of a signal. To calculate the entropy of agiven EEG signal, The Shannon entropy formula is used: 𝐸𝑁𝑇 = − 𝑁 ∑ 𝑖 =1 𝑥 𝑖 log( 𝑥 𝑖 ) (6)• The interquartile range of the EEG signal gives the statistical dispersion of a signal, which is a measureof how squeezed or stretched a distribution is. The signal is divided into two parts, one containing valueslower than the median and another containing values higher than the median to calculate the inter quartilerange. Therefore, it is the difference between the median of the left half and that of the right half.• Kurtosis is used to measure how much weight the tail of the probability distribution has. Hence, thepresence of spikes is indicated by increased kurtosis. Kurtosis can be calculated by: 𝐾 = ∑ 𝑁𝑖 =1 ( 𝑥 𝑖 − 𝜇 ) 𝑁𝜎 (7)Where 𝑥 𝑖 denotes the time series of an EEG data set and N the number of samples in the signal.The second feature extraction method that we are going to use in our implementation is Mel Frequency CepstralCoefficient(MFCC) and it is described below. Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 6 of 28 .2. Mel Frequency Cepstral Coefficient
MFCC was originally designed to study real speech registered by human ears but it can process quasi- stationarysignals in the forms of sound signals and EEG signals (Nguyen et al., 2012). This method is widely used in EEG signalprocessing with high accuracy (Othman et al., 2009). The methodology for its implementation is described below(Figure 4).
Figure 4:
Extraction flowchart of MFCC coefficients (Gong et al., 2015)
According to (Ren et al., 2018), the steps of the MFCC are described as follows:1.
Pre-emphasis and Framing :
Passing of signal through a filter that emphasizes higher frequencies, andthereafter, divide it into 𝑁 frames.2. Hamming window :
The hamming window is applied to each frame to obtain windowed frames, and can becalculated as follows: 𝑦 ( 𝑘 ) = 𝑥 ( 𝑘 ) × 𝐻 ( 𝑘 ) , (8)Where 𝐻 ( 𝑘 ) is given by: 𝐻 ( 𝑘 ) = 𝑎 − 𝑏 cos ( 𝜋𝑘𝑁 − 1 , 𝑘 = 0 , , , ..., 𝑁 − 1 ) (9)and 𝑁 is the number of points in a frame, and a, b denotes the parameters of the hamming window (a = 0.54, b =0.46). 𝑥 ( 𝑘 ) and 𝑦 ( 𝑘 ) are respectively the input and output signal while 𝐻 ( 𝑘 ) is the Hamming window.3. Changing from the time domain to the frequency domain :
The output of this step is called a spectrum.The frequency spectrum 𝐹 ( 𝑤 ) of the 𝑖 𝑡ℎ frame 𝑥 𝑖 ( 𝑛 ) can be calculated using the Fast Fourier Transform. Theshort-time power spectrum | 𝐹 ( 𝑤 ) | can be calculated and filtered by a Mel-filter bank 𝐵 𝑀𝑒𝑙 . The mappingrelation from a linear frequency f to Mel-frequency 𝑓 𝑀𝑒𝑙 can be calculated as follow: 𝑓 𝑀𝑒𝑙 = 𝛿 ln ( 𝑓𝜈 ) , (10)where f and 𝑓 𝑀𝑒𝑙 denote the linear frequency and the Mel-frequency respectively, and 𝛿 , 𝜈 are the parameters( 𝛿 = 2595 , 𝜈 = 700 ). Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 7 of 28 . Mel-frequency wrapping :
Calculating the log amplitude at the spectrum into the Mel scale by using a trianglefilter bank. The output of the short-time power spectrum | 𝐹 ( 𝑤 ) | can be calculated via this Mel-filter bank: 𝜃 ( 𝑀 𝑘 ) = ln [ 𝑁 ∑ 𝑘 =1 | 𝐹 ( 𝑤 ) | 𝐻 𝑚 ( 𝑘 ) ] , 𝑚 = 1 , , , ..., 𝑀 (11)Where 𝐻 𝑚 ( 𝑘 ) is the filter bank.5. Cepstrum :
Transforming mel-spectrum coefficient by using Discrete Cosine Transform (DCT) producingfourteen cepstral coefficients for every frame. The Mel-Frequency Cepstrum 𝑐 ( 𝑘 ) can be calculated by applyingthe updated Inverse Discrete Cosine Transform (IDCT) in the Mel-frequency coordinate spectrum, which can bedescribed by the following formula: 𝑐 𝑘 = 𝑀 ∑ 𝑘 =1 𝑀 𝑘 cos ( 𝑛 ( 𝑙 − 0 . 𝜋𝑀 ) 𝑛 = 1 , , ..., 𝑝 (12)where 𝑝 is the dimension of MFCC, 𝑐 ( 𝑘 ) denotes the 𝑘 𝑡ℎ MFCC, and 𝑝 is less than the number 𝑀 of Mel-filters.
3. Classification Techniques and Performance Evaluation
Classification is a supervised learning technique of categorizing a given set of data (structured or unstructured)into classes. The main goal of a classification problem is to identify the category/class to which a new data will fallunder. In this chapter, seven classifiers will be used, among which Linear and Quadratic Discriminant Analysis, Naivesbayes, Support vector Machine, K-Nearest Neighbor, Random Fores, Gradient Boosting. The coming section willpresent how they work as well as how their performances are assessed.
Let us assume that, the data set is defined by: {( 𝑥 𝑖 , 𝑦 𝑖 )} 𝑛𝑖 =1 , (13)where n is the sample size, 𝑥 𝑖 and 𝑦 𝑖 represent respectively the deferent feature vectors and the class labels. Forsimplicity, we note that 𝑥 𝑖 ∈ ℝ 𝑑 and 𝑦 𝑖 ∈ ℝ . The objectives of the following method is to classify the data into kclasses using Linear Discriminant Analysis (LDA), and Quadratic discriminant Analysis (QDA).To better understand how these two methods work, let us start our study by considering the one-dimensional datawhich means 𝑥 ∈ ℝ and just two classes.Let us denote by 𝐺 ( 𝑥 ) and 𝐺 ( 𝑥 ) the Cumulative Distribution Functions of these two classes. we can derive thecorresponding probability density functions by: 𝑔 ( 𝑥 ) = 𝜕𝐺 ( 𝑥 ) 𝜕𝑥 (14) 𝑔 ( 𝑥 ) = 𝜕𝐺 ( 𝑥 ) 𝜕𝑥 (15)Let us assume that the two classes have normal (Gaussian) distribution (Ghojogh and Crowley, 2019) and that the meanof one of the two classes is greater than the other one ( 𝜇 < 𝜇 ) . In the following, and represent respectively thefirst and second class.Let us denote by 𝑥 ∗ , the point where the probability of the two classes are equal. We can write that 𝜇 < 𝑥 ∗ < 𝜇 since we know that 𝜇 < 𝜇 . This means that: { 𝑖𝑓 𝑥 < 𝑥 ∗ then 𝑥 belong to 𝑖𝑓 𝑥 > 𝑥 ∗ then 𝑥 belong to The probability 𝑃 of the error in the estimation of the class where x belongs to can be written as: 𝑃 = 𝑃 ( 𝑥 > 𝑥 ∗ , 𝑥 ∈ ) + 𝑃 ( 𝑥 < 𝑥 ∗ , 𝑥 ∈ ) (16) Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 8 of 28 sing the conditional probability given by 17: 𝑃 ( 𝐴, 𝐵 ) = 𝑃 ( 𝐴 | 𝐵 ) 𝑃 ( 𝐵 ) (17)we can rewrite (16) as : 𝑃 = 𝑃 ( 𝑥 > 𝑥 ∗ | 𝑥 ∈ ) 𝑃 ( 𝑥 ∈ ) + 𝑃 ( 𝑥 < 𝑥 ∗ | 𝑥 ∈ ) 𝑃 ( 𝑥 ∈ ) (18)The Aim of these methods is to minimize 𝑃 by finding 𝑥 ∗ .Using the definition of the Cumulative Distribution Function, we can write: { 𝑃 ( 𝑥 > 𝑥 ∗ | 𝑥 ∈ ) = 1 − 𝐺 ( 𝑥 ∗ ) 𝑃 ( 𝑥 < 𝑥 ∗ | 𝑥 ∈ ) = 𝐺 ( 𝑥 ∗ ) (19)Using the definition of probability density function, we can write: { 𝑃 ( 𝑥 ∈ ) = 𝑔 ( 𝑥 ) = 𝜎 𝑃 ( 𝑥 ∈ ) = 𝑔 ( 𝑥 ) = 𝜎 (20)Here we denote the piors 𝑔 ( 𝑥 ) and 𝑔 ( 𝑥 ) by 𝜎 and 𝜎 . By replacing (19) and (20) in (18), we obtain: 𝑃 = [ 𝐺 ( 𝑥 ∗ ) ] 𝜎 + 𝐺 ( 𝑥 ∗ ) 𝜎 (21)Let us now take the derivative of 𝑃 for the sake of minimization 𝜕𝑃 𝜕𝑥 ∗ = − 𝑔 ( 𝑥 ∗ ) 𝜎 + 𝑔 ( 𝑥 ∗ ) 𝜎 If we set this derivative equal to zero, we will obtain the relation: 𝑔 ( 𝑥 ∗ ) 𝜎 = 𝑔 ( 𝑥 ∗ ) 𝜎 (22) 𝑔 𝑖 ( 𝑥 ∗ ) and 𝜎 are the likelihood (class conditional) and prior probabilities respectively. Now let us suppose that the datais a multivariate data with dimensionality d. The probability density function for multivariate Gaussian distribution isgiven by: 𝑔 ( 𝑥 ) = 1 √ (2 𝜋 ) 𝑑 | Σ | exp ( − ( 𝑥 − 𝜇 ) 𝑇 Σ −1 ( 𝑥 − 𝜇 )2 ) , (23)where 𝑥 ∈ ℝ 𝑑 , 𝜇 ∈ ℝ 𝑑 is the mean, Σ ∈ ℝ 𝑑 × 𝑑 is the covariance matrix, and | . | is the determinant of the matrix.Replacing (23) in (22), we obtain: √ (2 𝜋 ) 𝑑 | Σ | exp ( − ( 𝑥 − 𝜇 ) 𝑇 Σ −11 ( 𝑥 − 𝜇 )2 ) 𝜎 = 1 √ (2 𝜋 ) 𝑑 | Σ | × exp ( − ( 𝑥 − 𝜇 ) 𝑇 Σ −12 ( 𝑥 − 𝜇 )2 ) 𝜎 (24)This last equation is used for LDA and QDA methods In this case, we assume that the two classes have equal covariance matrices (Σ = Σ = Σ) Ghojogh and Crowley(2019). Therefore, the equation (24) becomes exp ( − ( 𝑥 − 𝜇 ) 𝑇 Σ −1 ( 𝑥 − 𝜇 )2 ) 𝜎 = exp ( − ( 𝑥 − 𝜇 ) 𝑇 Σ −1 ( 𝑥 − 𝜇 )2 ) 𝜎 Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 9 of 28 y taking the logarithm of both sizes we obtain: − ( 𝑥 − 𝜇 ) 𝑇 Σ −1 ( 𝑥 − 𝜇 )2 + ln( 𝜎 ) = − ( 𝑥 − 𝜇 ) 𝑇 Σ −1 ( 𝑥 − 𝜇 )2 + ln( 𝜎 ) (25)The numerator of the first term of the left-hand side can be developed and rewritten as: ( 𝑥 − 𝜇 ) 𝑇 Σ −1 ( 𝑥 − 𝜇 ) = 𝑥 𝑇 Σ −1 𝑥 + 𝜇 𝑇 Σ −1 𝜇 − 2 𝜇 𝑇 Σ −1 𝑥 (26)because 𝜇 𝑇 Σ −1 𝑥 = 𝑥 𝑇 Σ − 𝑇 𝜇 and Σ − 𝑇 = Σ −1 since Σ −1 is symmetric. We can do the same with the numerator ofright hand side and obtain: ( 𝑥 − 𝜇 ) 𝑇 Σ −1 ( 𝑥 − 𝜇 ) = 𝑥 𝑇 Σ −1 𝑥 + 𝜇 𝑇 Σ −1 𝜇 − 2 𝜇 𝑇 Σ −1 𝑥 (27)By replacing (26) and (27) in (25) and arranging we obtain: ( Σ −1 ( 𝜇 − 𝜇 ) ) 𝑇 𝑥 + ( 𝜇 − 𝜇 ) 𝑇 Σ −1 ( 𝜇 − 𝜇 ) + 2 ln( 𝜎 𝜎 ) (28)This last equation is the equation of a line. Thus, if we consider Gaussian distributions for the two classes where thecovariance matrices are assumed to be equal, the boundary of classification is a line. Because of this linearity whichdiscriminates the two classes, the method is called Linear Discriminant Analysis (LDA).If we define Γ( 𝑥 ) : ℝ 𝑑 ⟼ ℝ such that: Γ( 𝑥 ) = 2 ( Σ −1 ( 𝜇 − 𝜇 ) ) 𝑇 𝑥 + ( 𝜇 − 𝜇 ) 𝑇 Σ −1 ( 𝜇 − 𝜇 ) + 2 ln( 𝜎 𝜎 ) (29)then the class of an instance x is estimated as: ( 𝑥 ) = { 𝑖𝑓 Γ( 𝑥 ) < 𝑖𝑓 Γ( 𝑥 ) > (30) In this case, we do not assume that the two classes have equal covariance matrices, so we have (Σ ≠ Σ ) (Ghojoghand Crowley, 2019). Therefore, by taking the natural logarithm of both sides of equation (24) we obtain: − 12 ln( | Σ | ) − ( 𝑥 − 𝜇 ) 𝑇 Σ −11 ( 𝑥 − 𝜇 )2 + ln( 𝜎 ) = − 12 ln( | Σ | )− ( 𝑥 − 𝜇 ) 𝑇 Σ −12 ( 𝑥 − 𝜇 )2 + ln( 𝜎 ) (31)According to (26), we can rewrite (31) as: − 12 ln( | Σ | ) − 12 𝑥 𝑇 Σ −11 𝑥 − 12 𝜇 𝑇 Σ −11 𝜇 + 𝜇 𝑇 Σ −11 𝑥 + ln( 𝜎 )= − 12 ln( | Σ | ) − 12 𝑥 𝑇 Σ −12 𝑥 − 12 𝜇 𝑇 Σ −12 𝜇 + 𝜇 𝑇 Σ −12 𝑥 + ln( 𝜎 ) (32)Let us multiply (32) by 2. After some rearrangement, we obtain: 𝑥 𝑇 ( Σ − Σ ) −1 𝑥 + 2 ( Σ −12 𝜇 − Σ −11 𝜇 ) 𝑇 𝑥 + ( 𝜇 𝑇 Σ −11 𝜇 − 𝜇 𝑇 Σ −12 𝜇 ) + ln ( | Σ || Σ | ) + 2 ln ( 𝜎 𝜎 ) = 0 which is in the quadratic form. Hence, if we consider Gaussian distributions for the two classes, the decision boundaryof classification is quadratic, that is why this method is called Quadratic Discriminant Analysis(QDA).Let us define Γ( 𝑥 ) : ℝ 𝑑 ⟼ ℝ such that: Γ( 𝑥 ) = 𝑥 𝑇 ( Σ − Σ ) −1 𝑥 + 2 ( Σ −12 𝜇 − Σ −11 𝜇 ) 𝑇 𝑥 + ( 𝜇 𝑇 Σ −11 𝜇 − 𝜇 𝑇 Σ −12 𝜇 ) + ln ( | Σ || Σ | ) + 2 ln ( 𝜎 𝜎 ) (33)For estimation of the class of an instance x, we use equation (30). Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 10 of 28 .1.3. LDA and QDA for Multi-Class Classification
In this general case, we consider multiple classes (they can be more than two) indexed by k ∈ {1 , ..., | 𝐶 | } . Accordingto Equations (22) and (23) we have: 𝑔 𝑘 ( 𝑥 ) 𝜎 𝑘 = 1 √ (2 𝜋 ) 𝑑 | Σ 𝑘 | exp ( − ( 𝑥 − 𝜇 𝑘 ) 𝑇 Σ −1 𝑘 ( 𝑥 − 𝜇 𝑘 )2 ) 𝜎 𝑘 (34)If we take the logarithm of (34), we have: ln( 𝑔 𝑘 ( 𝑥 ) 𝜎 𝑘 ) = − 𝑑 𝜋 ) − 12 ln( | Σ 𝑘 | ) − 12 ( 𝑥 − 𝜇 𝑘 ) 𝑇 Σ −1 𝑘 ( 𝑥 − 𝜇 𝑘 ) + ln(Σ 𝑘 | ) Let us drop the first term because it is the same for all the classes. So we have: Γ 𝑘 ( 𝑥 ) = − 12 ln( | Σ 𝑘 | ) − 12 ( 𝑥 − 𝜇 𝑘 ) 𝑇 Σ −1 𝑘 ( 𝑥 − 𝜇 𝑘 ) + ln( | Σ 𝑘 | )Γ 𝑘 ( 𝑥 ) is the scaled posterior of the k-th class.In QDA, the class of instance x is estimated as: ̂𝐶 ( 𝑥 ) = arg max 𝑘 Γ 𝑘 ( 𝑥 ) (35)In LDA we assume that Σ = Σ = ... = Σ | 𝐶 | = Σ . Therefore Γ 𝑘 ( 𝑥 ) becomes: Γ 𝑘 ( 𝑥 ) = − 12 ln( | Σ | ) − 12 𝑥 𝑇 Σ −1 𝑥 − 12 𝜇 𝑇𝑘 Σ −1 𝜇 𝑘 + 𝜇 𝑇𝑘 Σ −1 𝑥 + ln( 𝜎 𝑘 ) We drop the first and second of the right-hand side term because it is the same for all the classes. Hence, we have: Γ 𝑘 ( 𝑥 ) = 𝜇 𝑇𝑘 Σ −1 𝑥 − 12 𝜇 𝑇𝑘 Σ −1 𝜇 𝑘 + ln( 𝜎 𝑘 ) So the class of the instance x is determined by (35)
Naive Bayes classifier is a classifier based on the Bayes Theorem with the naive assumption that, given theirbelonging to a specific class, features are independent of each other. Let’s see how we can derive the model.
Definition 3.3.
Bayes Theorem (Stuart et al., 1994)Given a feature vector 𝑋 = ( 𝑥 , 𝑥 , ..., 𝑥 𝑛 ) and a class variable 𝐶 𝑘 , the Bayes Theorem states that: 𝑃 ( 𝐶 𝑘 | 𝑋 ) = 𝑃 ( 𝑋 | 𝐶 𝑘 ) 𝑃 ( 𝐶 𝑘 ) 𝑃 ( 𝑋 ) , for 𝑛 𝑘 = 1 , , ..., 𝐾 , (36) 𝑃 ( 𝐶 𝑘 | 𝑋 ) is called the posterior probability, 𝑃 ( 𝑋 | 𝐶 𝑘 ) the likelihood, 𝑃 ( 𝐶 𝑘 ) the prior probability of class and 𝑃 ( 𝑋 ) theprior probability of predictor.We are interested in calculating the posterior probability from the likelihood and prior probabilities. Using thechain rule, the likelihood 𝑃 ( 𝑋 | 𝐶𝑘 ) can be decomposed as: 𝑃 ( 𝑋 | 𝐶 𝑘 ) = 𝑃 ( 𝑥 | 𝑥 , ...𝑥 𝑛 , 𝐶 𝑘 ) 𝑃 ( 𝑥 | 𝑥 , ...𝑥 𝑛 , 𝐶 𝑘 ) ...𝑃 ( 𝑥 𝑛 −1 | 𝑥 𝑛 , 𝐶 𝑘 ) 𝑃 ( 𝑥 𝑛 | 𝐶 𝑘 ) (37)The above sets of probabilities can be hard and expensive to compute. However we can use the Naive independenceassumption which is given by: 𝑃 ( 𝑥 𝑖 ∣ 𝑥 𝑖 +1 , ..., 𝑥 𝑛 ∣ 𝐶 𝑘 ) = 𝑃 ( 𝑥 𝑖 ∣ 𝐶 𝑘 ) (38) Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 11 of 28 sing ( 38) in ( 37), we have: 𝑃 ( 𝑋 ∣ 𝐶 𝑘 ) = 𝑃 ( 𝑥 , ..., 𝑥 𝑛 ∣ 𝐶 𝑘 ) = 𝑛 ∏ 𝑖 =1 𝑃 ( 𝑥 𝑖 ∣ 𝐶 𝑘 ) (39)Therefore, the posterior probability ( 36) can then be written as: 𝑃 ( 𝐶 𝑘 | 𝑋 ) = 𝑃 ( 𝐶 𝑘 ) ∏ 𝑛𝑖 =1 𝑃 ( 𝑥 𝑖 ∣ 𝐶 𝑘 ) 𝑃 ( 𝑋 ) (40)Knowing that the prior probability of predictor 𝑃 ( 𝑋 ) is constant given the input, we can write (40) as: 𝑃 ( 𝐶 𝑘 | 𝑋 ) ∝ 𝑃 ( 𝐶 𝑘 ) 𝑛 ∏ 𝑖 =1 𝑃 ( 𝑥 𝑖 ∣ 𝐶 𝑘 ) (41)where ∝ means positively proportional to.The Naive Bayes classification problem is : for different class values of Ck, find the maximum of 𝑃 ( 𝐶 𝑘 ) 𝑛 ∏ 𝑖 =1 𝑃 ( 𝑥 𝑖 ∣ 𝐶 𝑘 ) . Mathematically, we can formulate this problem as: ̂𝐶 = arg max 𝐶 𝑘 𝑃 ( 𝐶 𝑘 ) 𝑛 ∏ 𝑖 =1 𝑃 ( 𝑥 𝑖 ∣ 𝐶 𝑘 ) (42)The prior probability of class 𝑃 ( 𝐶 𝑘 ) could be estimated as the relative frequency of class 𝐶 𝑘 in the training data. • When the assumption of independent predictors holds true, a Naive Bayes classifier performs better as comparedto other models.• Naive Bayes requires a small amount of training data to estimate the test data. So the training period takes lesstime.• It can be used for both binary and multi-class classification problems. • The main limitation of Naive Bayes is the assumption of independent predictor features. Naive Bayes implicitlyassumes that all the attributes are mutually independent. In real life, it is almost impossible that we get a set ofpredictors that are completely independent or one another.• If a categorical variable has a category in the test dataset, which was not observed in the training dataset, then themodel will assign a 0 (zero) probability and will be unable to make a prediction. Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for both classificationor regression problems. In the SVM algorithm, we plot each data item as a point in n-dimensional space (where nis number of features ), with the value of each feature being the value of a particular coordinate. Then, we performclassification by finding the hyperplane that differentiates the classes very well. It can be use for binary classificationand multi-class classification.Below we present the mathematics behind binary classification.Let us assume that, the dataset is define by: {( 𝑥 𝑖 , 𝑦 𝑖 )} 𝑛𝑖 =1 , (43) Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 12 of 28 here n is the sample size, 𝑥 𝑖 and 𝑦 𝑖 represent respectively the different feature vectors and the class labels. We notethat 𝑥 𝑖 ∈ ℝ 𝑑 and 𝑦 𝑖 ∈ {−1 , .According to (Nguyen et al., 2012), Support Vector Machine (SVM) algorithm will find the optimal hyperplanegiven by (44): 𝑓 ( 𝑥 ) = 𝑤 𝑇 Φ( 𝑥 ) + 𝑏 (44)to separate the training data by solving the optimization problem (45) 𝑚𝑖𝑛
12 ∥ 𝑤 ∥ + 𝑛 ∑ 𝑖 =1 𝜓 𝑖 (45)subject to the constraint (46). 𝑦 𝑖 ( 𝑤 𝑇 Φ( 𝑥 𝑖 ) + 𝑏 ) ≥ 𝜓 𝑖 and 𝜓 𝑖 ≥ , 𝑖 = 1 , ..., 𝑛 (46)The optimization problem (45) will guarantee to maximize the hyperplane margin while minimizing the cost of error,where 𝜓 𝑖 , 𝑖 = 1 , ..., 𝑛 are non-negative slack variables introduced to relax the constraints of separable data problems tothe constraint (46) of non-separable data problems. For an error to occur, the corresponding must exceed unity, so ∑ 𝑖 𝜓 𝑖 is an upper bound on the number of training errors. Hence an extra cost 𝑖 𝜓 𝑖 for errors is added to the objectivefunction where is a parameter chosen by the user.For Nonlinear classification (case where, the data cannot be separated by an hyperplane), the kernel function 𝐾 ( 𝑥 𝑖 , 𝑥 𝑗 ) = Φ( 𝑥 𝑖 ) 𝑇 Φ( 𝑥 𝑗 ) is introduced and the optimal hyperplane becomes: 𝑓 ( 𝑥 ) = 𝑛 ∑ 𝑖 =1 𝛼 𝑖 𝑦 𝑖 𝐾 ( 𝑠 𝑖 , 𝑥 ) + 𝑏 , (47)where 𝑠 𝑖 is the 𝑖 𝑡ℎ support vector. The function Φ ∶ 𝑥 ( 𝑖 ) ⟶ Φ( 𝑥 ) ( 𝑖 ) is a map from the data space to the feature spacesuch that the data are linearly separable in the feature space.Note that there are several kernel functions such as Polynomial kernel, Gaussian kernel and Sigmoidal Kernel. K-Nearest Neighbor (KNN) is the simplest classification algorithm. The approach is to plot all data points on space,and with any new sample, observe its k nearest points on space and make a decision based on majority voting. Thus,KNN algorithm involves no training and it takes the least calculation time when implemented with an optimal value ofk. The steps of KNN algorithm are as follows Sutton (2012):• For a given instance, find its distance from all other data points. Use an appropriate distance metric based on theproblem instance.• Sort the computed distances in increasing order. Depending on the value of k, observe the nearest k points.• Identify the majority class amongst the k points, and declare it as the predicted class.Choosing an optimal value of k is a challenge in this approach. Most often, the process is repeated for several differenttrials of k. The evaluation scores are then observed using a graph to find the optimal value of k.
Random forests can be define as a combination of tree predictors such that each tree depends on the values of arandom vector sampled independently and with the same distribution for all trees in the forest. The steps of building arandom forest classifier are as follows (Cutler et al., 2012):• Select a subset of features from the dataset.• From the selected subset of features, using the best split method, pick a node.• Continue the best split method to form child nodes from the subset of features.• Repeat the steps until all nodes are used as split.• Iteratively create n numbers of trees using steps 1-4 to form a forest.
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 13 of 28 igure 5:
Steps of Random Forest (Cutler et al., 2012)
Gradient Boosting is one of the technique for performing supervised machine learning tasks, like classification andregression. Like Random Forests, it is an ensemble learner. This means that, it will create a final model based on acollection of individual models. Let us see how it works. The magic of this model is described in the name: “Gradient"plus “Boosting".Boosting built models from individual in an iterative way. In boosting, the individual models are not built oncompletely random subsets of data and features but sequentially by putting more weight on instances with wrongpredictions and high errors. The general idea behind this is that instances, which are hard to predict correctly will befocused on during learning, so that the model learns from past mistakes. When we train each ensemble on a subset ofthe training set, it is called Stochastic Gradient Boosting, which can help improve generalizability of the model (Natekinand Knoll, 2013).Similar to how Neural Networks utilize gradient descent to optimize ("learn") weights, the gradient is used tominimize a loss function. The weak learner is built and its predictions are compared to the correct outcome that weexpect in each round of training. The error rate of the model is estimated by the distance between prediction, and truthwhich can be used to calculate the gradient. The gradient is basically the partial derivative of the loss function, thus itdescribes the steepness of the error function (Natekin and Knoll, 2013).
To train and evaluate the performance of different models, we will use K-fold cross-validation with 5 replicationsfollowed by the confusion matrix which is describe below.
Cross-validation, also know as rotation estimation, is the statistical practice of partitioning a sample of data intosubsets such that the analysis is initially performed on a single subset, while the other subset(s) are retained forsubsequent use in confirming and validating the initial analysis. The initial subset of data is called the training set; theother subset(s) are called validation or testing sets. In K-fold cross-validation, the original sample is partitioned into Ksub-samples. Between K sub-samples, a single subsample is retained as the validation data for testing the model, andthe remaining 𝐾 − 1 sub samples are used as training data. The cross-validation process is then repeated K times (thefolds), with each of the K sub-samples used exactly once as the validation data. The K results from the folds then canbe averaged to produce a single estimation.• The advantage of this method over repeated random sub-sampling is that all observations are used for bothtraining and validation, and each observation is used for validation exactly once. The variance of the resultingestimate is reduced as k is increased. Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 14 of 28
The disadvantage of this method is that the training algorithm has to be run again from scratch k times, whichmeans it takes k times as much computation to make an evaluation.
For performance recognition evaluation, confusion matrix metrics are often used. The criteria for performanceevaluation usually employed include three parts: sensitivity (the proportion of the total number of positive cases thatare correctly classified), specificity (the proportion of the total number of negative cases that are correctly classified),and classification accuracy (the proportion of the total number of EEG signals that are correctly classified).
4. Experimental Results and Analysis
In this section, we implement the different methods presented above on the EEG dataset. As features extractionmethods, we implement Discrete Wavelet Transform (DWT) and Mel Frequency Ceptral Coefficient (MFCC) and asa classification method, we implement QDA, LDA, RF, NB, GB, KNN, SVM. We also make a comparison not onlybetween the different feature extraction methods but also the interaction between feature extraction and the classifierbased on the final scores of the classifiers. The overal process of classification is describeb by Figure 6.
Figure 6:
Process of EEG signal classification (Wen and Zhang, 2017)
5. Methodology
The flowchart of the proposed classification framework is shown by Figure 7
The database used in this work is recorded at the University Hospital Bonn, Germany. It is composed of fivedifferent sections of EEG signals, and these sections are represented by symbols S, F, N, O, and Z as shown in Table 1.Each of these sections consist of 100 signals, where recording time was about 23.6 s. In order to record the data in themost accurate way, they used an amplified system with 128 signal channels, in which the output resulted in 173.61Hz of the sampling rate. The EEG samples in the O and Z datasets are derived from healthy volunteers with externalsurface electrodes for open and closed eye conditions. The F and N datasets are acquired during seizure-free intervalsand the dataset S contains only the seizure activity. The five data sets S, F, N, O, and Z are classified into two distinctgroups in our study. The epileptic seizure class (S) is composed of the subset S, and the non-seizure class (FNOZ) iscomposed of the subsets F, N, O, and Z, respectively.
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 15 of 28 igure 7:
The flowchart of the proposed classification framework
Table 1
The definitions and descriptions for the electroencephalographic (EEG) signals from the University of Bonn, Germany
Information Dataset O Dataset Z Dataset F Dataset N Dataset S
State Awake and eyesopen(Healthy) Awake and eyesclosed(Healthy) Seizure-free Seizure-free Seizure activityElectrode type Surface Surface Intercranial Intercranial Intercranial 𝑁 𝑜 of channels 100 100 100 100 100Recording time 23.6 23.6 23.6 23.6 23.6 In this study, we built and used two datasets from the raw database. The first dataset is an imbalanced dataset with0.2 as the prevalence of the positive class (Figure 8) and the second is a balanced dataset with 0.5 as the prevalence ofthe positive class (Figure 9).
Figure 8:
Flowchart for the building of the imbalanced dataset
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 16 of 28 igure 9:
Flowchart for the building of the balanced dataset
Feature engineering is the process of converting raws data into features that better represent the raw observationsfor predictive models. Thus, it can improve the accuracy of the model on unseen data. The flowchart of our featureengineering is described in Figure 10.
Figure 10:
The flowchart of Feature Engineering
For features extraction, we have implemented 4 feature extraction methods namely :• Discrete wavelet transform DWT-db4,• Discrete wavelet transform DWT-db2,• Discrete wavelet transform DWT-coif1,• Mel Frequency Ceptral Coeficient MFCC.We follow the step of wavelet transform for the first three and the step of MFCC describeb in chapter three for the lastone.•
Wavelet Threshold De-Noising :
We can see in Figure 11 one signal after and before de-noising•
Wavelet Decomposition : The DWT is used to split a signal into different frequency sub-bands, as many asneeded or as many as possible. We can see in figures (12, 13, 14, 15, 16) the five decomposed bands from onesignal that we are going to use.
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 17 of 28 igure 11:
Signal before and after de-noising
Figure 12:
Sub-band signals using four-level wavelet decomposition (db4) from an original EEG signal: the approximationin 0-4 Hz.
Figure 13:
Sub-band signals using four-level wavelet decomposition (db4) from an original EEG signal: the detail in 4-8Hz. • Feature Extraction : After decomposition, we extract some features in the time-frequency Domain (meanaverage value,standard deviation, Relative band power, spectral entropy), in the frequency Domain (relativepower spectral density estimated by the coefficients of the FFT), and in time domain ( mean, median, standarddeviation and the total variation). Additionally, the maximum, minimum and the total variation measures of theDWT transform coefficients are also estimated in order to describe the non-stationary signals.
Some of the original features extracted are correlated and redundant. To select an optimal feature subset fromthe original feature set, we use one feature dimensionality reduction method named (PCA). The PCA algorithm is
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 18 of 28 igure 14:
Sub-band signals using four-level wavelet decomposition (db4) from an original EEG signal: the detail in8-16 Hz.
Figure 15:
Sub-band signals using four-level wavelet decomposition (db4) from an original EEG signal: the detail in16-32 Hz.
Figure 16:
Sub-band signals using four-level wavelet decomposition (db4) from an original EEG signal. the detail in32-64 Hz. implemented to obtain a relatively low dimensional, but significantly discriminative feature set which will improve theclassification performance
Seven classifiers are used after 4 feature extraction methods to detect the epileptic seizure from the non-seizure.The differents classifier are:• Linear discriminant analysis (LDA),• Quadratic discriminant analysis (QDA),
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 19 of 28 able 2
The classification performance for 10-fold CV without features extraction
LDA 69.60 94.50 40.00 86.80 94.00 44.00QDA 92.90 99.75 86.00
KNN 75.50 100 42.00 88.48 100 52.00NB 94.7 96.50 88.00 95.16 99.00 90.00RF 94.30 99.00 81.00 95.44 95.00 96.00GB
Imbalanced data Balanced dataTable 3
The classification performance of 10-fold CV with wavelet the “db4"method
LDA 98.50 99.25 83.00 95.24 99.00 99.00QDA 96.40 99.25 87.00 97.04 100.00 94.00KNN 97.90 99.50 86.00 97.00 100.00 94.00NB 91.80 93.75 87.00 92.24 99.00 95.00RF • K-nearest neigbor (KNN),• Naive Bayes (NB),• Random forest (RF),• Gradient boosting (GB),• Suport vector machine (SVM).The results of the classification performance measured by the Sensitivity (SEN), the Specificity (SPE), and Accuracy(ACC) using 10-fold cross-validation are shown in the different tables. Besides, boxplot are used to compare thedifferent models and features extraction methods.
According to Figure 17, we can draw the conclusion below for the balanced dataset:• Without features extraction : – The best model is Quadratic Discriminant Analysis (QDA) which has 96.92% of accuracy, 100% ofspecificity, and 85% of sensitivity . – LDA and KNN have the smallest accuracy and can be avoided when we performed a classification withoutfeatures extraction.
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 20 of 28 able 4
The classification performance of 10-fold CV with wavelet the “db2"method
LDA 96.10 99.25 89.00 95.60 100.00 98.00QDA 94.9 97.25 91.00 97.84 99.00 91.00KNN 94.80 97.50 84.00 96.92 100.00 89.00NB 92.70 94.00 84.00 91.88 93.00 92.00RF 96.20 99.50 85.00
GB 96.10 98.75 93.00 98.48 98.00 97.00SVM
Imbalanced data Balanced dataTable 5
The classification performance of 10-fold CV with wavelet the “coif1"method
LDA 94.20 99.50 88.00 95.28 100.00 96.00QDA 93.30 98.50 90.00 97.52 99.00 91.00KNN 95.30 99.75 82.00 96.04 100.00 92.00NB 92.80 95.00 84.0 92.40 91.00 95.00RF 95.90 98.50 92.00 97.04 95.00 98.00GB 96.20 98.25 94.00 97.20 95.00 98.25SVM
The classification performance of 10-fold CV with the MFCC method
LDA 93.70 99.00 82.00 95.24 97.00 90.00QDA 94.00 99.05 83.00 94.00 95.00 93.00KNN 90.70 98.25 78.00 94.20 95.50 93.00NB 92.30 96.75 85.00 94.72 95.00 94.00RF 94.40 98.50 77.00 95.80 96.50 94.00GB • Among the feature extraction methods : – Wavelet db4 is the best method when it is used with SVM which has 98.99% of Accuracy 99% of sensitivityand 99% of specificity. – Wavelet db2 associated with RF challenge db4 associated with SVM with 98.56% of Accuracy, 97% ofsensitivity, and 98.10% of specificity. – Wavelet coif1 brings its best results when it is Associated with SVM and the results are 97.92% of Accuracy,98% of sensitivity, and 98% of specificity. – MFCC performs less than all the DWT used here but can bring its best results when associated to SVM.
According to Figure 18, we can draw the conclusion below for the imbalanced dataset:
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 21 of 28 igure 17:
Box-plot for model comparison in terms of accuracy in the balanced data
Figure 18:
Box-plot for model comparison in term of accuracy in the imbalanced dataset • Without features extraction : – The best model is GB which has 95.1% of accuracy, 82.00% of sensitivity and 95.36% of specificity. – LDA and KNN have the smallest accuracy and can be avoided when we performed a classification withoutfeatures extraction.• Among the feature extraction methods : – Wavelet db2 is the best method when it is associated with RF which brings 98.99% of accuracy, 99.25% ofspecificity, and 95% of sensitivity. – Wavelet db4 can challenge wavelet db2 when it is used with SVM or RF which here bring 98.90% ofaccuracy, 99.10% of specificity, and 98% of sensitivity.
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 22 of 28 able 7
ANOVA analysis for the imbalanced dataset Df Sum Sq Mean Sq F value Pr( > F)Models 6 8947.34 1491.22 60.10 0.0000feat_extr 4 18879.49 4719.87 190.23 0.0000Models:feat_extr 24 28916.51 1204.85 48.56 0.0000Residuals 1715 42552.50 24.81
Table 8 omega-squared names Ω – Wavelet coif1 brings its best results when it is associated with SVM and the results are 97.80% of accuracy,99.25% of specificity, and 93% of sensitivity. – MFCC performs less than all the DWT used here but can bring its best results when it is associated withSVM.Looking at the results presented above, we observe many differences in the predictive performances. In the next part,we check the statistical significance of the difference in predictive performances and the effect Size.
There are two factors to evaluate; feature extraction method and model . They have five and seven levelsrespectively. Therefore, the two-way Analysis Of Variance (ANOVA) is suitable for our analysis. Using the two-way ANOVA, we can simultaneously evaluate how the type of feature extraction and model affect the accuracy ofclassification. Hence, we can test three effects below on classification accuracy:• Effect of features extraction.• Effect of models.• Effect of features extraction and models interactions.
From Table 7, the P-value obtained from ANOVA analysis for feature extraction, models, and interaction arestatistically significant ( 𝑃 ≤ . . We conclude that, the type of feature extraction, the type of model, and interactionof both feature extraction and model significantly affects the accuracy of classification.Each factor has an independent significant effect on classification accuracy. While it is good to know if there is astatistically significant effect of some models or feature extraction techniques on the accuracy, it is as important toknow the size of the effect they have on the outcome. To check this, we can calculate the effect size which is estimatedby the measures of omega-squared presented in Table 8.They are an estimate of how much variance in the response variables is accounted for by the explanatory variables.The following interpretation of omega-squared are suggested by (Field, 2013).• Omega-squared = 0 - 0.01: very small• Omega-squared = 0.01 - 0.06: small• Omega-squared = 0.06 - 0.14: medium• Omega-squared > Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 23 of 28 able 9
ANOVA analysis for the balanced dataset Df Sum Sq Mean Sq F value Pr( > F)Models 6 4607.28 767.88 111.95 0.0000feat_extr 4 2564.63 641.16 93.48 0.0000Models:feat_extr 24 4953.55 206.40 30.09 0.0000Residuals 1715 11762.96 6.86
Table 10 omega-squared names Ω Table 11
HSD-test for pairwise comparison of features extraction (Imbalanced dataset)term comparison estimate conf.low conf.high adj.p.value1 feat_extr mfcc-wfe 5.31 4.29 6.34 0.002 feat_extr coif1-wfe 7.33 6.30 8.36 0.003 feat_extr db2-wfe 7.89 6.86 8.91 0.004 feat_extr db4-wfe 9.49 8.46 10.51 0.005 feat_extr coif1-mfcc 2.01 0.99 3.04 0.006 feat_extr db2-mfcc 2.57 1.54 3.60 0.007 feat_extr db4-mfcc 4.17 3.14 5.20 0.008 feat_extr db2-coif1 0.56 -0.47 1.59 0.589 feat_extr db4-coif1 2.16 1.13 3.19 0.0010 feat_extr db4-db2 1.60 0.57 2.63 0.00
According to this, Model has a medium effect on the mean accuracy while feature extraction and the interaction betweenmodel and feature extraction have a large effect.
From Table 9, the P-value obtained from ANOVA analysis for feature extraction, models, and interaction arestatistically significant ( 𝑃 ≤ . . We conclude that, the type of feature extraction, the type of model and, interactionof both feature extraction and model significantly affect the accuracy of classification.As previously seen, each factor has an independent significant effect on the classification accuracy. For the effectsize, model, and the interaction between model and feature extraction have a large effect on the mean accuracy whilefeature extraction has a medium effect (Table 10).Now, given that the feature extraction methods alongside the suitable models are statistically significant upon acertain effect size, it is important to mention that, ANOVA does not tell much about which of the two models outperformthe other. To know the pairs of significant different feature extraction methods, and type of model, we can perform amultiple pairwise comparison analysis. Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 24 of 28 able 12
HSD-test for pairwise comparison of features extraction (Balanced dataset)term comparison estimate conf.low conf.high adj.p.value1 feat_extr Mfcc-wfe 1.45 0.91 1.99 0.002 feat_extr coif1-wfe 2.66 2.12 3.20 0.003 feat_extr db4-wfe 3.06 2.52 3.60 0.004 feat_extr db2-wfe 3.22 2.68 3.76 0.005 feat_extr coif1-Mfcc 1.22 0.68 1.76 0.006 feat_extr db4-Mfcc 1.62 1.08 2.16 0.007 feat_extr db2-Mfcc 1.77 1.23 2.31 0.008 feat_extr db4-coif1 0.40 -0.14 0.94 0.269 feat_extr db2-coif1 0.55 0.01 1.09 0.0410 feat_extr db2-db4 0.15 -0.39 0.69 0.94
Table 13
HSD-test for pairwise comparison of models (Imbalanced dataset)term comparison estimate conf.low conf.high adj.p.value1 Models KNN-LDA 0.42 -0.90 1.74 0.972 Models NB-LDA 2.44 1.12 3.76 0.003 Models QDA-LDA 3.96 2.64 5.28 0.004 Models RF-LDA 5.28 3.96 6.60 0.005 Models GB-LDA 5.46 4.14 6.78 0.006 Models SVM-LDA 5.92 4.60 7.24 0.007 Models NB-KNN 2.02 0.70 3.34 0.008 Models QDA-KNN 3.54 2.22 4.86 0.009 Models RF-KNN 4.86 3.54 6.18 0.0010 Models GB-KNN 5.04 3.72 6.36 0.0011 Models SVM-KNN 5.50 4.18 6.82 0.0012 Models QDA-NB 1.52 0.20 2.84 0.0113 Models RF-NB 2.84 1.52 4.16 0.0014 Models GB-NB 3.02 1.70 4.34 0.0015 Models SVM-NB 3.48 2.16 4.80 0.0016 Models RF-QDA 1.32 0.00 2.64 0.0517 Models GB-QDA 1.50 0.18 2.82 0.0118 Models SVM-QDA 1.96 0.64 3.28 0.0019 Models GB-RF 0.18 -1.14 1.50 1.0020 Models SVM-RF 0.64 -0.68 1.96 0.7821 Models SVM-GB 0.46 -0.86 1.78 0.95Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 25 of 28 able 14
HSD-test for pairwise comparison of models (Balanced dataset)term comparison estimate conf.low conf.high adj.p.value1 Models LDA-NB 0.35 -0.34 1.04 0.742 Models KNN-NB 1.25 0.56 1.94 0.003 Models QDA-NB 3.39 2.70 4.08 0.004 Models GB-NB 3.40 2.71 4.09 0.005 Models RF-NB 3.64 2.95 4.33 0.006 Models SVM-NB 4.31 3.62 5.00 0.007 Models KNN-LDA 0.90 0.20 1.59 0.008 Models QDA-LDA 3.04 2.35 3.73 0.009 Models GB-LDA 3.05 2.36 3.74 0.0010 Models RF-LDA 3.29 2.60 3.98 0.0011 Models SVM-LDA 3.96 3.27 4.65 0.0012 Models QDA-KNN 2.14 1.45 2.84 0.0013 Models GB-KNN 2.15 1.46 2.84 0.0014 Models RF-KNN 2.39 1.70 3.08 0.0015 Models SVM-KNN 3.06 2.37 3.76 0.0016 Models GB-QDA 0.01 -0.68 0.70 1.0017 Models RF-QDA 0.25 -0.44 0.94 0.9418 Models SVM-QDA 0.92 0.23 1.61 0.0019 Models RF-GB 0.24 -0.45 0.93 0.9520 Models SVM-GB 0.91 0.22 1.60 0.0021 Models SVM-RF 0.67 -0.02 1.36 0.06
6. Discussion and Conclusion
In this paper four feature extraction techniques namely DWT-db4, DWT-db2, DWT-coif1, and MFCC wereinvestigated and combined with seven machine learning classifiers for classifying epilepsy seizure in the case of abalanced and an imbalanced datasets. Stochastic Hold Out with 50 replications was used in the creation of the predictiveperformances. The two-way and one-way ANOVA test were used for statistical significance analysis of the differencein predictive performances and effect size. Tukey HSD test was used for pairwise comparison analysis of models andfeature extraction methods. The results indicate that, in the imbalanced dataset, without features extraction, GradientBoosting (GB) performed best with a classification accuracy of 95.1% but, this accuracy is only significantly high incomparison with LDA and KNN.Among the feature extraction methods, DWT-db2 associated with Random Forest(RF) is the best combination with 98.99% as a classification accuracy. However, this combination is challenged byDWT-db4 associated with Support Vector Machine (SVM) or Random Forest (RF) with a classification accuracy of98.90%.In the balanced dataset, without features extraction, Quadratic Discriminant Analysis (QDA) performed best with aclassification accuracy of 96.92% which is only significantly high in comparison with LDA and KNN. Among thefeature extraction methods, DWT-db4 associated with Support Vector Machine (SVM) is the best combination with aclassification accuracy of 98.99%. Nevertheless, this combination is challenged by DWT-db2 associated with RandomForest (RF) with a classification accuracy of 98.56%.The results also highlight that, MFCC performs less than all the DWT used here in the balanced or imbalanceddataset. The mean-difference are statistically significant with the minimum mean-difference of 2.01 and 1.2 when it iscompared to coif1 respectively in the imbalanced and balanced dataset (Table 11 and 12). Whether in the balancedor the imbalanced dataset, the feature extraction methods, model, and the interaction between them have statisticallysignificant effect on the classification accuracy. In the imbalanced dataset, the model has a medium effect while featureextraction and interaction between model and feature extraction have a large effect. In the balanced dataset, model andinteraction between model and feature extraction have a large effect while feature extraction has a medium effect.For the improvement of this work, we are planning to analyze the mean-difference of the different interactionsbetween feature extraction methods and classifiers. Also, we are planning to study larger databases to evaluate thetruthfulness of the present results in any EEG dataset of epilepsy seizure, and further, to establish the magnitude of thedifference in the predictive performances between the balanced and imbalanced datasets. Another direction should be
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 26 of 28 o extend the number of feature extraction methods and classifiers model, and also test our study in ECG dataset.
CRediT authorship contribution statement
Cyrille Feudjio:
Investigation, Conceptualization, Methodology, Software, Data curation, Experimentation,Result compilation, Writing - original draft.
Victoire Djimna Noyum:
Software, Result compilation, Writing -review.
Younous Perieukeu Mofendjou:
Software, Result compilation, Writing - review .
Rockefeller:
Supervision,Validation, Writing - review & editing.
Ernest Fokoué:
Conceptualization of this study, Supervision, Validation,Writing - review & editing.
References
Acharya, U.R., Sree, S.V., Ang, P.C.A., Yanti, R., Suri, J.S., 2012. Application of non-linear and wavelet based features for the automatedidentification of epileptic eeg signals. International journal of neural systems 22, 1250002.Ahammad, N., Fathima, T., Joseph, P., 2014. Detection of epileptic seizure event and onset using eeg. BioMed research international 2014.Bandarabadi, M., Teixeira, C.A., Rasekhi, J., Dourado, A., 2015. Epileptic seizure prediction using relative spectral power features. ClinicalNeurophysiology 126, 237–248.Bhople, A.D., Tijare, P., 2012. Fast fourier transform based classification of epileptic seizure using artificial neural network. Int J Adv Res ComputSci Softw Eng 2.Chen, D., Wan, S., Xiang, J., Bao, F.S., 2017. A high-performance seizure detection algorithm based on discrete wavelet transform (dwt) and eeg.PloS one 12.Cutler, A., Cutler, D.R., Stevens, J.R., 2012. Random forests, in: Ensemble machine learning. Springer, pp. 157–175.Field, A., 2013. Discovering statistics using IBM SPSS statistics. sage.Gadhoumi, K., Lina, J.M., Gotman, J., 2012. Discriminating preictal and interictal states in patients with temporal lobe epilepsy using waveletanalysis of intracerebral eeg. Clinical neurophysiology 123, 1906–1916.Gajic, D., Djurovic, Z., Gligorijevic, J., Di Gennaro, S., Savic-Gajic, I., 2015. Detection of epileptiform activity in eeg signals based on time-frequencyand non-linear analysis. Frontiers in computational neuroscience 9, 38.Ghojogh, B., Crowley, M., 2019. Linear and quadratic discriminant analysis: Tutorial. arXiv preprint arXiv:1906.02590 .Gilda, S., Slepian, Z., 2019. Automatic kalman-filter-based wavelet shrinkage denoising of 1d stellar spectra. Monthly Notices of the RoyalAstronomical Society 490, 5249–5269.Golmohammadi, M., Shah, V., Lopez, S., Ziyabari, S., Yang, S., Camaratta, J., Obeid, I., Picone, J., 2017. The tuh eeg seizure corpus, in: Proceedingsof the American Clinical Neurophysiology Society Annual Meeting, p. 1.Gong, S., Dai, Y., Ji, J., Wang, J., Sun, H., 2015. Emotion analysis of telephone complaints from customer based on affective computing.Computational intelligence and neuroscience 2015.Guenot, M., 2004. Surgical treatment of epilepsy: outcome of various surgical procedures in adults and children. Revue neurologique 160, 5S241–50.John Martin, R., Sujatha, S., Swapna, S., 2018. Multiresolution analysis in eeg signal feature engineering for epileptic seizure detection. InternationalJournal of Computer Applications 975, 8887.Kandar, H., Das, S.K., Ghosh, L., Gupta, B.K., 2012. Epilepsy and its management: A review. Journal of PharmaSciTech 1, 20–26.Merry, R., 2005. Wavelet theory and applications: a literature study. DCT rapporten 2005.Mursalin, M., Zhang, Y., Chen, Y., Chawla, N.V., 2017. Automated epileptic seizure detection using improved correlation-based feature selectionwith random forest classifier. Neurocomputing 241, 204–214.Natekin, A., Knoll, A., 2013. Gradient boosting machines, a tutorial. Frontiers in neurorobotics 7, 21.Nguyen, P., Tran, D., Huang, X., Sharma, D., 2012. A proposed feature extraction method for eeg-based person identification, in: Proceedings on theInternational Conference on Artificial Intelligence (ICAI), The Steering Committee of The World Congress in Computer Science, Computer . p. 1.Othman, M., Wahab, A., Khosrowabadi, R., 2009. Mfcc for robust emotion detection using eeg, in: 2009 IEEE 9th Malaysia International Conferenceon Communications (MICC), IEEE. pp. 98–101.Paul, Y., 2018. Various epileptic seizure detection techniques using biomedical signals: a review. Brain informatics 5, 6.Polat, K., Güne¸s, S., 2007. Classification of epileptiform eeg using a hybrid system based on decision tree classifier and fast fourier transform.Applied Mathematics and Computation 187, 1017–1026.al Qerem, A., Kharbat, F., Nashwan, S., Ashraf, S., blaou, k., 2020. General model for best feature extraction of eeg using discrete wavelet transformwavelet family and differential evolution. International Journal of Distributed Sensor Networks 16, 1550147720911009.Rasekhi, J., Mollaei, M.R.K., Bandarabadi, M., Teixeira, C.A., Dourado, A., 2013. Preprocessing effects of 22 linear univariate features on theperformance of seizure prediction methods. Journal of neuroscience methods 217, 9–16.Ren, H., Qu, J., Chai, Y., Huang, L., Tang, Q., 2018. Cepstrum coefficient analysis from low-frequency to high-frequency applied to automaticepileptic seizure detection with bio-electrical signals. Applied Sciences 8, 1528.Saputro, I.R.D., Maryati, N.D., Solihati, S.R., Wijayanto, I., Hadiyoso, S., Patmasari, R., 2019. Seizure type classification on eeg signal usingsupport vector machine, in: Journal of Physics: Conference Series, IOP Publishing. p. 012065.Stuart, A., Arnold, S., Ord, J.K., O’Hagan, A., Forster, J., 1994. Kendall’s advanced theory of statistics. Wiley.Sutton, O., 2012. Introduction to k nearest neighbour classification and condensed nearest neighbour data reduction. University lectures, Universityof Leicester , 1–10.Teixeira, C.A., Direito, B., Bandarabadi, M., Le Van Quyen, M., Valderrama, M., Schelter, B., Schulze-Bonhage, A., Navarro, V., Sales, F., Dourado,
Cyrille Feudjio et al:
Preprint submitted to Elsevier
Page 27 of 28 ., 2014. Epileptic seizure predictors based on computational intelligence techniques: A comparative study with 278 patients. Computer methodsand programs in biomedicine 114, 324–336.Ullah, I., Hussain, M., Aboalsamh, H., et al., 2018. An automated system for epilepsy detection using eeg brain signals based on deep learningapproach. Expert Systems with Applications 107, 61–71.Usman, S.M., Usman, M., Fong, S., 2017. Epileptic seizures prediction using machine learning methods. Computational and mathematical methodsin medicine 2017.Wang, L., Xue, W., Li, Y., Luo, M., Huang, J., Cui, W., Huang, C., 2017. Automatic epileptic seizure detection in eeg signals using multi-domainfeature extraction and nonlinear analysis. Entropy 19, 222.Wang, Y., Zhou, W., Yuan, Q., Li, X., Meng, Q., Zhao, X., Wang, J., 2013. Comparison of ictal and interictal eeg signals using fractal features.International journal of neural systems 23, 1350028.Wen, T., Zhang, Z., 2017. Effective and extensible feature extraction method using genetic algorithm-based frequency-domain feature search forepileptic eeg multiclassification. Medicine 96.Zandi, A.S., Tafreshi, R., Javidan, M., Dumont, G.A., 2013. Predicting epileptic seizures in scalp eeg based on a variational bayesian gaussianmixture model of zero-crossing intervals. IEEE Transactions on Biomedical Engineering 60, 1401–1413.Zhang, Y., Liu, B., Ji, X., Huang, D., 2017. Classification of eeg signals based on autoregressive model and wavelet packet decomposition. NeuralProcessing Letters 45, 365–378.
Cyrille Feudjio et al: