16
Biomedical Signal Processing and Control 57 (2020) 101702 Contents lists available at ScienceDirect Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc A review of feature extraction and performance evaluation in epileptic seizure detection using EEG Poomipat Boonyakitanont a , Apiwat Lek-uthai a , Krisnachai Chomtho b , Jitkomut Songsiri a,a Department of Electrical Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand b Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand a r t i c l e i n f o Article history: Received 4 February 2019 Received in revised form 13 September 2019 Accepted 5 October 2019 Keywords: Seizure detection EEG Feature extraction classification a b s t r a c t Since the manual detection of electrographic seizures in continuous electroencephalogram (EEG) mon- itoring is very time-consuming and requires a trained expert, attempts to develop automatic seizure detection are diverse and ongoing. Machine learning approaches are intensely being applied to this problem due to their ability to classify seizure conditions from a large amount of data, and provide pre-screened results for neurologists. Several features, data transformations, and classifiers have been explored to analyze and classify seizures via EEG signals. In the literature, some jointly-applied features used in the classification may have shared similar contributions, making them redundant in the learn- ing process. Therefore, this paper aims to comprehensively summarize feature descriptions and their interpretations in characterizing epileptic seizures using EEG signals, as well as to review classification performance metrics. To provide meaningful information of feature selection, we conducted an exper- iment to examine the quality of each feature independently. The Bayesian error and non-parametric probability distribution estimation were employed to determine the significance of the individual fea- tures. Moreover, a redundancy analysis using a correlation-based feature selection was applied. The results showed that the following features variance, energy, nonlinear energy, and Shannon entropy computed on a raw EEG signal, as well as variance, energy, kurtosis, and line length calculated on wavelet coefficients were able to significantly capture the seizures. When compared with a baseline method of classifying all epochs as normal, an improvement of 4.77–13.51% in the Bayesian error was obtained. © 2019 Elsevier Ltd. All rights reserved. Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Time-domain features (TDFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2. Frequency-domain features (FDFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3. Time-frequency-domain features (TFDFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4. Complexity of feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3. Epileptic seizure detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1. Performance metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 3.2. Selected features used in epileptic seizure detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2.1. Time-domain features (TDFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 3.2.2. Frequency-domain features (FDFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Corresponding author. E-mail addresses: [email protected] (P. Boonyakitanont), [email protected] (A. Lek-uthai), [email protected] (K. Chomtho), [email protected] (J. Songsiri). https://doi.org/10.1016/j.bspc.2019.101702 1746-8094/© 2019 Elsevier Ltd. All rights reserved.

Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

Ae

PJa

b

a

ARR1A

KSEFc

C

aj

h1

Biomedical Signal Processing and Control 57 (2020) 101702

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control

journa l homepage: www.e lsev ier .com/ locate /bspc

review of feature extraction and performance evaluation inpileptic seizure detection using EEG

oomipat Boonyakitanont a, Apiwat Lek-uthai a, Krisnachai Chomtho b,itkomut Songsiri a,∗

Department of Electrical Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, ThailandDepartment of Pediatrics, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand

r t i c l e i n f o

rticle history:eceived 4 February 2019eceived in revised form3 September 2019ccepted 5 October 2019

eywords:eizure detectionEGeature extractionlassification

a b s t r a c t

Since the manual detection of electrographic seizures in continuous electroencephalogram (EEG) mon-itoring is very time-consuming and requires a trained expert, attempts to develop automatic seizuredetection are diverse and ongoing. Machine learning approaches are intensely being applied to thisproblem due to their ability to classify seizure conditions from a large amount of data, and providepre-screened results for neurologists. Several features, data transformations, and classifiers have beenexplored to analyze and classify seizures via EEG signals. In the literature, some jointly-applied featuresused in the classification may have shared similar contributions, making them redundant in the learn-ing process. Therefore, this paper aims to comprehensively summarize feature descriptions and theirinterpretations in characterizing epileptic seizures using EEG signals, as well as to review classificationperformance metrics. To provide meaningful information of feature selection, we conducted an exper-iment to examine the quality of each feature independently. The Bayesian error and non-parametricprobability distribution estimation were employed to determine the significance of the individual fea-

tures. Moreover, a redundancy analysis using a correlation-based feature selection was applied. Theresults showed that the following features – variance, energy, nonlinear energy, and Shannon entropycomputed on a raw EEG signal, as well as variance, energy, kurtosis, and line length calculated on waveletcoefficients – were able to significantly capture the seizures. When compared with a baseline method ofclassifying all epochs as normal, an improvement of 4.77–13.51% in the Bayesian error was obtained.

© 2019 Elsevier Ltd. All rights reserved.

ontents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22. Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1. Time-domain features (TDFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2. Frequency-domain features (FDFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3. Time-frequency-domain features (TFDFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4. Complexity of feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3. Epileptic seizure detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1. Performance metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73.2. Selected features used in epileptic seizure detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2.1. Time-domain features (TDFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83.2.2. Frequency-domain features (FDFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

∗ Corresponding author.E-mail addresses: [email protected] (P. Boonyakitanont),

[email protected] (A. Lek-uthai), [email protected] (K. Chomtho),[email protected] (J. Songsiri).

ttps://doi.org/10.1016/j.bspc.2019.101702746-8094/© 2019 Elsevier Ltd. All rights reserved.

Page 2: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

2 P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / Biomedical Signal Processing and Control 57 (2020) 101702

3.2.3. Time-frequency-domain features (TFDFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2.4. Multi-domain features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4. Methods for feature evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.1. Bayes error rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .124.2. Correlation-based feature selection (CFS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5. Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125.1. Feature significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2. Feature redundancy analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Declaration of Competing Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1

AcbasuTtwctftaerilfotfpdrrpeawRfisc

mtvfosmtotfhe

. Introduction

An epileptic seizure, as defined by the International Leaguegainst Epilepsy [1], is a temporary event of symptoms due to syn-hronization of abnormally excessive activities of neurons in therain. It has been estimated that approximately 65 million peopleround the world are affected by epilepsy [2]. Nevertheless, it istill a time-consuming process for neurologists to review contin-ous electroencephalograms (EEGs) to monitor epileptic patients.herefore, several researchers have developed different techniqueshat help neurologists to identify an epilepsy occurrence [3–5]. Thehole process of automated epileptic seizure analysis primarily

onsists of data acquisition, signal pre-processing, feature extrac-ion, feature or channel selection, and classification. This paperocuses on a selection of features commonly used in the litera-ure, including statistical parameters (mean, variance, skewness,nd kurtosis), amplitude-related parameters (energy, nonlinearnergy, line length, maximum and minimum values) and entropy-elated measures. These features can be categorized based on theirnterpretation or the domain from which the features are calcu-ated. While some studies have considered a particular group ofeatures applicable to their proposed classification method [6–8],thers have applied various groups of features extracted from theime, frequency, and time-frequency domains. For example, 55eatures were used with a support vector machine (SVM) and post-rocessing for neonatal seizure detection, which provided a goodetection rate of 89.2% with one false detection per hour [9]. Featureedundancy and relevance analysis were applied to 132 features toeduce the vector dimension [10]. Fast correlation based-filter pro-osed in [11], correlation-based selection (CFS) [12], and ReliefFstablished in [13], were utilized to select non-redundant featuresnd the filtered features were classified by an artificial neural net-ork (ANN). It turned out that 30-optimal selected features byeliefF achieved the best result with 91% sensitivity and 95% speci-city. These studies showed that there is the need to review featureelection as relevant features are directly related to the seizurelassification performance.

There have been a number of review articles focusing on auto-ated epileptic seizure detection (AESD). A brief investigation on

he AESD methods with some mathematical descriptions was pro-ided in [3]. The EEG analyses were categorized into time domain,requency domain, time-frequency domain, and nonlinear meth-ds. The authors then discussed the importance of analysis ofurrogate data and methods of epilepsy classifications of two (nor-al and ictal) and three (normal, interictal, and ictal) classes, where

he nonlinear features outperformed the others with an accuracyf more than 99%. However, implementation of statistical parame-

ers was not addressed in this paper. Applications of using entropyor epilepsy analysis were reported in [14]. Many types of entropyave been utilized in the AESD, both jointly and independently. Allntropies were mathematically stated in combination with their

advantages and disadvantages. The authors also showed differ-ences of each feature in the normal and ictal groups using the F-testfor checking the equality of variances. The three highest ranked fea-tures were Renyi’s, sample, and spectral entropies. Moreover, theseizure detection techniques of the whole process, including fea-ture extraction, feature selection, and classification methods, weresummarized [15]. Feature extraction methods were divided intotwo spectral domains, i.e., time-frequency analysis and temporaldomain. However, only a brief demonstration of each method wasdepicted; there was no explanation in the mathematical formu-lae and the AESD implementation was not described in the paper.Characteristics of focal and non-focal seizure activities observedfrom EEG signals were also reviewed in [16]. Focal EEG signals andnon-focal EEG signals were obtained from the epilepsy-affected andunaffected areas, respectively. Nonlinear features, such as Hjorthparameters, entropy, and fractal dimension, were used to char-acterize and compare the focal and non-focal EEG signals usingthe Bern-Barcelona EEG database [17]. Nevertheless, none of thesefeatures were expressed mathematically.

In addition, some studies have, in part, involved the AESD appli-cation. Studies on channel selection techniques for processing EEGsignals and reducing feature dimensions in seizure detection andprediction are needed because considering every channel maycause overfitting problems [18]. The channel selection techniquesof filtering, wrapper, embedded, hybrid, and human-based tech-niques were graphically explained via flowcharts and found toshow a high accuracy and low computational cost function. Uses ofdeep learning, including ANNs, recurrent neural networks (RNNs),and convolutional neural networks (CNNs), in biological signalprocessing, including EEG, electromyography, and electrocardiog-raphy signals, were also reviewed in [19]. The author summarizedthat there were 24 studies relevant to EEG signal analysis; twoof which were focused on AESD and its prediction. It was alsoreported that applying CNN on the AESD had an 86.9% accu-racy. Furthermore, methods for non-stationary signal classificationusing time-frequency and time-scale representations, and theirdifferences were discussed in depth in the seizure detection appli-cation [20]. Several features were extracted after time-frequencyand time-scale transformations: spectrogram, Extended ModifiedB-Distribution, compact kernel distribution (CKD), Wigner-Villedistribution (WVD), scalogram, Affine WVD (AWVD), pseudo-smoothed AWVD, and CKD with feature selection, fed to a randomforest (RF). The results showed that every time-frequency andtime-scale transformation achieved an accuracy of more than 80%,while CKD with 50 optimal selected features yielded an accuracyof 86.41%.

A review mainly focusing on using the wavelet-based method

for computer-aided seizure detection was described in [21]. Theauthors summarized approaches for seizure detection using differ-ent wavelets, i.e., discrete wavelet transform (DWT) and continuouswavelet transform (CWT), with the use of nonlinear-based fea-
Page 3: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

iomed

ttfaouesstgasbfm

gapofuonipt[fdatiuaTicErOtd

vilcaatsfdniaaawatTbe

P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / B

ures and chaos-based measurements. The authors showed thathe Daubechies wavelet tap 4 was the most favorable waveletor use in seizure detection. However, there was no explanationbout the benefits of each wavelet, no mathematical expressionf wavelet transforms, and no comparison of the results betweensing different wavelet-based techniques. In addition, a review ofpileptic seizure prediction processes, including data collection,ignal pre-processing, feature extraction, feature selection, clas-ification, and performance evaluation, was reported in [22]. Inhe feature extraction, the authors categorized features into fourroups of linear univariate, linear bivariate, nonlinear univariate,nd nonlinear bivariate features. In this case, a nonlinear mea-ure was defined as an attribute related to dynamic states of therain. Moreover, univariate and bivariate features were extractedrom individual and multiple channels, respectively. However, no

athematical description of all the processes was given.From the past literature, we conclude that selecting distin-

uished features is required for applying machine learning (ML)pproaches to AESD, but previous studies still lack some conclusiveoints. Firstly, many previous papers have applied a combinationf features while providing neither a clear conclusion about whicheature has the highest contribution nor an intuitive meaning ofsing such features [9,23,24]. Secondly, mathematical expressionsf some complicated features, such as entropies, and Hurst expo-ent (HE), were not consistently stated and so may lead to an

ncorrect implementation [3,15]. Thirdly, most papers focused on aarticular group of features that use the classification accuracy ashe main indicator to conclude the effectiveness of those features24–26]. However, we believe that this cannot be a fair indicatoror a method comparison. To illustrate this, we consider using aefault classifier that regards all signals as normal. This methodlready yields a high accuracy if the data set in consideration con-ains mostly normal signals, known as imbalanced data sets. Hence,n our opinion, a performance comparison of all features is requiredsing the same experimental setting based on the Bayes classifiernd should be done with a well-adjusted classification indicator.his paper, therefore, aims to explore all those issues by provid-

ng a review of the feature extraction process and investigating theontribution of each feature used in the seizure classification inEG signals. We reported the accuracies and other related met-ics (if available) of previous studies on the same EEG databases.ur conclusions related to feature significance are drawn from

he experiments using commonly-used features and on a publicatabase reported in the literature.

Literature search. In this review, we mainly gathered articlesia Google Scholar to obtain publications from several databases,ncluding ScienceDirect and IEEE Xplore. Studies that were pub-ished between January 2010 and December 2018 were primarilyonsidered to reveal the current status of work. The first searchimed to find review articles using the following keywords: review,utomated epileptic seizure detection, EEG signal, channel selec-ion, feature selection, deep learning, and feature extraction. In theecond search, we primarily used the keywords: feature, nonlineareature, feature extraction, EEG signal, seizure detection, epilepsyetection, automated epileptic seizure detection, entropy, neuraletwork, and fractal dimension for finding research articles focus-

ng on the feature extraction. In addition, we also included morerticles from ScienceDirect and Mendeley recommender systemsnd references of the included studies. Studies that focused onnimals, devices, software, medical treatment, drugs, and surgeryere excluded. Only relevant studies that used only EEG signals,

pplied non-typical features or combinations of widely used fea-

ures, and had a well-described feature explanation were included.his led to a total of 55 papers, including nine review studies, toe thoroughly reviewed on the feature extraction. The other refer-nces were related to techniques used in feature extraction.

ical Signal Processing and Control 57 (2020) 101702 3

This manuscript is organized as follows. Section 2 describesthe features extracted from each domain and their computationalcomplexity. Features with detailed mathematical description arecategorized according to the domains from which they were cal-culated: time, frequency, and time-frequency domains. In Section3, we discuss evaluation metrics and summarize methods for theAESD based on the use of features. Section 4 explains the Bayesmethod for classification and CFS for assessing feature performanceand redundancy, and the results are compared in Section 5.

2. Feature extraction

This section describes the details of features commonly usedin the literature of EEG seizure detection, categorized by featuredomains: time, frequency, and time-frequency domains. Time-domain features (TDFs) are those calculated on raw EEG signals oron pre-processed signals done in the time domain, such as empiricalmode decomposition (EMD). Frequency-domain features (FDFs) arecomputed on discrete-Fourier transform of raw EEG signals. Time-frequency-domain features (TFDFs) are defined on transformedEEG signals that contain both time and frequency characteristics,such as short-time Fourier transform (STFT) spectrogram or DWT.In the last section, we briefly explain the computational complexityof each feature so that users can take this concern when per-forming a real-time implementation of AESD. To what follows, wedenote X = [x[0], x[1], . . ., x[N − 1]] a sequence of length N usedfor extracting a feature. For instance, X can be an epoch of a rawEEG segment, absolute values of a raw EEG segment, power spec-tral density (PSD), approximation or detail coefficients from anywavelet transform, or intrinsic mode functions (IMFs) from EMD.

2.1. Time-domain features (TDFs)

A TDF is calculated from a raw EEG signal or a decomposed signalperformed on a time domain. In the literature, one example of sig-nal transformation is EMD that decomposes a signal into IMFs [27]and is commonly applied to signals that apparently exhibit non-stationary properties. In this section, well-known TDFs are brieflydescribed and those involving an entropy concept are explainedwith mathematical expressions.

1 Groups of statistical parameters have been frequently used todiscriminate between ictal and normal patterns, because it isassumed that EEG statistical distributions during a seizure andnormal periods are different. These parameters are mean, vari-ance, mode, median, skewness (third moment describing dataasymmetry), and kurtosis (fourth moment determining tailed-ness of the distribution). The minimum and maximum values arealso used to quantify the range of data or the magnitude of signalbaseline. Other statistical parameters include coefficient of vari-ation (CV) defined as the ratio of the standard deviation (SD) tothe sample mean that explains the dispersion of the data in rela-tion to the population mean; first (Q1) and third quartiles (Q3)that quantify the data denseness; and interquartile range (IQR)that measures a deviation between the first and third quartiles.

2 Energy, average power, and root mean squared value (RMS) aremutually relevant to amplitude measurements. The energy is asummation of a squared signal, the average power is the signalmean square, and the RMS is the square root of the averagepower.

3 Line length, sometimes called curve length, is the total verticallength of the signal defined as

L(X) =N−1∑i=1

|x[i] − x[i − 1]|. (1)

Page 4: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

4 iomed

1(r).

P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / B

It was originally presented in [28] as an approximation ofKatz’s fractal dimension.

4 Amplitude-integrated EEG (aEEG) is a visual inspection that iswidely used for seizure detection in hospitals [29]. The aEEG sig-nal is obtained by calculating the difference between adjacentmaximum and minimum values, i.e., the peak-to-peak recti-fication. Consequently, the rectified signal is passed throughthe process of interpolation and semi-logarithmic compressionproposed in [30]. The interpolated signal is linearly displayedin the range 0–10 �V and semi-logarithmically compressed inthe range over 10 �V. This feature is applied based upon theassumption that during seizure events the amplitude of signalsare shifted up from its baseline during normal events.

5 Nonlinear energy (NE), firstly established in [31], extends theconcept of energy (quadratic measure) to including indefiniteterms of shifted and lagged sequences, defined as

NE(X) =N−2∑i=1

(x2[i] − x[i + 1]x[i − 1]

). (2)

In [31], if a signal has a simple harmonic motion with theamplitude A and the oscillation frequency ω, it can be derivedthat the NE is proportional to A2ω2 when the sampling fre-quency is high. Hence, high values of NE can indicate bothshifted values in a high frequency of oscillation and amplitude.

6 Shannon entropy (ShEn) [32] reflects the uncertainty in randomprocess or quantities. It is defined as

ShEn(X) = −∑i

pi log pi, (3)

where pi is the probability of an occurrence of each of valuein X.

7 Approximate entropy (ApEn) [33] is a measure of the regu-larity and fluctuation in a time series derived by comparingthe similarity patterns of template vectors. The templatevector of size m is defined as a windowed signal: u[i] =[x[i] x[i + 1] · · · x[i + m − 1]

]T, and we first consider the

self-similarity of the template vector u[i] with a tolerance r,defined by

Cmi (r) = 1N − m + 1

N−m∑j=0

� (r − ||u[i] − u[j]||∞) ,

where � (x) is the Heaviside step function, i.e., � (x) is onewhen x ≥ 0, and zero otherwise. When X is mostly self-similar,then u[i] and u[j] sequences are very close and thus Ci is high.The ApEn aggregates the self-similarity indices over all shiftedpossibilities of template vectors given template length and tol-erance. The ApEn is defined as

ApEn (X, m, r) = 1N − m + 1

N−m∑i=0

log Cmi (r) − 1N − m

N−m−1∑i=0

log Cm+i

8 Sample entropy (SampEn) [34] is based upon a concept similarto the ApEn, where the SampEn compares the total number oftemplate vectors of size m and m + 1. The SampEn differs fromthe ApEn in that the self-similarity of all pairs of template vectorsu[i] and u[j] with a tolerance r is calculated by

�m(r) =N−m∑j=0,j /= i

N−m∑i=0

� (r − ||u[i] − u[j]||∞) .

ical Signal Processing and Control 57 (2020) 101702

(4)

If the signals are self-similar, �m(r) is high. The SampEn isdefined by

SampEn (X, m, r) = log �m(r) − log �m+1(r). (5)

9 Permutation entropy (PE) [35] is a measure of the local complex-ity in a signal. With the template vector u[i] and a permutationpattern �k of order m consisting of m! patterns, the permuta-tion pattern probability for all k = 1, 2, . . ., m! is defined as theprobability that a template vector has the same pattern as thepermutation pattern:

p(�k) = 1N − m + 1

N−m∑i=0

f (u[i], �k) , (6)

where f (u[i], �k) = 1 when u[i] and �k are the same pattern,and zero otherwise. In this case, the pattern is defined by theorder of u[i] corresponding to its element. Then, the PE is definedas

PE (X) = −m!∑k=1

p(�k) log p(�k). (7)

10 Weighted-permutation entropy (WPE) [36] is a measure of theprocess complexity extended from PE by putting the varianceof the template vector as a weight (wi) in computing the prob-ability of each permutation pattern, as calculated by

p(�k) =(N−m∑i=0

f (u[i], �k)wi

)/

(N−m∑i=0

wi

), (8)

where f (u[i], �k) = 1 when u[i] and �k are the same pattern,and zero otherwise. Finally, the WPE is calculated by

WPE (X) = −m!∑k=1

p(�k) log p(�k). (9)

11 Fuzzy entropy (FuzzEn) is another measure of the process uncer-tainty, where its calculation is based on a zero-mean sequence[37]. Consider a windowed signal with mean removed:

u[i] =[x[i] x[i + 1] · · · x[i + m − 1]

]T − x[i]1, where x[i] =(1/m)

∑m−1j=0 x[i + j], and 1 is a column vector of ones. Like other

entropy measures, the sequence u is used to compute a self-similarity index, �, but for FuzzEn, a Gaussian function is usedas defined by

�(m)(r) = 1(N − m)(N − m − 1)

N−m∑j=0,j /= i

N−m∑i=0

ed2ij/2r2

,

where dij = ||u[i] − u[j]||∞. If X is self-similar, then dij is smalland relative to the Gaussian curve with variance r2, and so wecan determine how high �(m)(r) is. The FuzzEn is then definedto reflect the change of � as the window length m is increasedto m + 1,

FuzzEn (X, m, r) = log �m(r) − log �m+1(r). (10)

12 Distribution entropy (DistEn) measures the uncertainty of a pro-

cess by estimating the probability distribution with a histogram[38]. With the zero-mean vector defined previously in FuzzEn,the distance between two template vectors (windowed signals)is dij = ||u[i] − u[j]||∞ and is used to construct a histogram with B
Page 5: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

iomed

1

1

1

P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / B

bins. The DistEn is then defined to be the entropy of the randomprocess dij:

DistEn (X, m, B) = − 1log B

B∑i=1

pi log pi, (11)

where pi is the relative frequency of each bin. A high DistEnvalue reflects unpredictable natures of the original sequence X.

3 Singular Value Decomposition (SVD) entropy (SVDEn), first estab-lished in a brain-computer interface application, was proposedto measure the temporal and spatial complexity of a processbased on entropy [39]. Any rectangular matrix A has an SVDof A = U�VT, where � = diag(�1, �2, . . ., �m) containing the sin-gular values of A. The number of non-zero singular values of Adetermines its rank. In feature extraction, the normalized sin-gular value is given by �j = �j/

∑mi �i and the SVDEn is defined

as the entropy form of these normalized singular values:

SVDEn(X) = −∑j

�j log �j. (12)

For the temporal measurement, a delay method of time is used to construct each row of the matrix A. For the spatialSVDEn, each row of the matrix A is a sequence of the EEG signalfrom each channel. It is claimed that when the EEG signals arenoisy, potentially all singular values are non-zero (A is full rank).More importantly, it is observed that singular values associatedwith noisy signals are significantly smaller than the determinis-tic (noiseless) signal. Therefore, a low SVDEn value spatially andtemporally indicates patterns of epileptic seizure in EEG signalsbecause the singular values from the EEG signals from a seizureare higher than those of the normal signals.

4 Hurst exponent (HE) [40] indicates a degree of time series ten-dency. We call a partial time series of X as a chopped signal havinglength m, such that m < N, e.g. m = N/2. A cumulative deviate series,or so-called partial cumulative time series, is defined as

z[t] =t∑j=0

(x[j] − x) , (13)

where x = (1/m)∑m−1

i=0 x[i] is the average of the partial timeseries. With these auxiliary signals, we consider R(m) as a cumu-lative range: R(m) = max

tz[t] − min

tz[t], and S(m) as the SD of the

partial time series. Then, the HE is defined as the slope of thestraight line that fits log

(R(m)/S(m)

)as a function of log m by

the least-square method.

log(R(m)/S(m)

)= HE · log m.

With a fixed m, we observe how the signal is fluctuated fromits mean via z and the range of this fluctuation is explained inR, relative to the SD (normalized by S). Thus, the HE tends tocapture how this range grows as m increases and approximatesit as a linear growth in a log-scale.

5 Fractal dimension is a mathematical index for measuring sig-nal complexity [41]. The Higuchi dimension (DH) was proposedto calculate the fractal dimension for a time series in the timedomain [42]. A partial length, denoted as Lm(), is calculatedfrom a partial time series with a time interval and an initialtime m = 0, 1, 2, . . ., as

Lm() = 1

⎛⎝� N−m

�∑i=1

|x[m + i] − x[m + (i − 1)]|

⎞⎠ N − 1

�N−m �

. (14)

ical Signal Processing and Control 57 (2020) 101702 5

The slope of the straight-line fitting log L() as a function oflog is equal to −DH(X), where L() is the average of Lm() over .Moreover, there is another approach, the Minkowski dimensionor box-counting dimension (DB), that measures the complexityof images and signals by counting square boxes that cover theobject [41]. Here, DB(X) is defined as

DB(X) = limε→0

log N(ε)log(1/ε)

, (15)

where N(ε) is the number of boxes of size ε of each siderequired to cover the signal.

16 Local Binary Pattern (LBP), Local Neighbor Descriptive Pattern(LNDP), and Local Gradient Pattern (LGP) are encoding techniquesthat transform a partial time series into encoded patterns [5,43].For i = 0, 1, . . ., N − m − 1 when m is a positive even integer, theLBP of the partial time series is defined as the difference betweenthe partial time series and its center:

LBPi(X) =m/2−1∑j=0

�(x[i + j] − x[i + m/2]

)2j

+m−1∑j=m/2

�(x[i + j + 1] − x[i + m/2]

)2j, (16)

where �(x) is the Heaviside step function. The LNDP capturesthe relationship of signals from the neighbor sequences and isdefined by

LNDPi(X) =m−1∑j=0

�(x[i + j] − x[i + j + 1])2j. (17)

The LGP represents the changes in a signal as measured by thedifference between signal gradients and the average gradient,and is defined by

LGPi(X) =m/2−1∑j=0

(|x[i + j] − x[i + m/2]|

− 1m

m∑k=0

|x[i + k] − x[i]|)

2j

+m−1∑j=m/2

(|x[i + j + 1] − x[i + m/2]|

− 1m

m∑k=0

|x[i + k] − x[i]|)

2j. (18)

These three parameters are pattern indicators and their his-tograms can form into features. The main advantage of theseparameters is their invariant properties. The LBP is globallyinvariant, while the LNDP and LGP are not affected by local vari-ation in the signal. From the algorithms, it was observed thathistograms from normal and seizure EEG signals are different;the histograms of LBP, LNDP, and LGP from the abnormal signalshave a high frequency at some specific values.

17 Three Hjorth parameters (activity, mobility, and complexity) wereestablished to characterize the spectral properties of EEG signalsin the time domain [44]. The activity is the variance of a signal

and is high during seizure events in terms of a high amplitudevariation from its mean. The mobility is a measure of a quantityrelated to the SD of the signal PSD. The mobility is expressed asthe ratio of the SD of a signal derivative to the SD of the signal.
Page 6: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

6 iomed

1

1

2

2

2

ratsf

1

2

P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / B

Complexity represents the difference between the signal and apure sine wave. The complexity is defined by the ratio of themobility of a signal derivative to the mobility of the signal.

8 Two bandwidths – amplitude modulation bandwidth (BAM) andfrequency modulation bandwidth (BFM) – calculated from theanalytic signal of IMF are measures of a frequency spread interms of changes in amplitude and average frequency [45]. Letz(t) be the analytic signal of IMF y(t) defined as

z(t) = y(t) + jyH(t) = A(t)ej�(t), (19)

where yH(t) = y(t) * 1/(�t) is the Hilbert transform of c(t), A(t)denotes the amplitude of the analytic signal, and �(t) representsthe instantaneous phase [46]. Calculations of BAM and BFM areas follows:

BAM = 1E

∫ (dA(t)dt

)2

dt, BFM = 1E

∫ (d�(t)dt

− ω)2

A2(t)dt,

(20)

where E is the analytic signal energy, and ω = 1E

∫d�(t)dt A

2(t)dtis the center frequency.

9 Detrended fluctuation analysis (DFA) is a method of examiningthe self-correlation of a signal, which is similar to the HE. Itscalculation is conducted by the following steps [47]. A partialcumulative time series, z[t], as defined in (13) is used to deter-mine a trend of the signal. Subsequently, z[t] is then segmentedinto a signal of length n. A local trend of the segment is estimatedby a least-square straight line z[t]. The mean-squared residualF[n] is then obtained from

F[n] =

√√√√ 1N

N−1∑t=0

(z[t] − z[t]

)2. (21)

DFA(X) is defined as the slope of the straight-line fittinglog F[n] as a function of log n in a regression setting. The resultindicates the fluctuation of the signal when its local trend isremoved.

0 Number of zero-crossings is an indirect measurement of thefrequency characteristics of a signal. If this number is large,it means the signal contains high frequency components andmore uncertainty.

1 Number of local extrema is the total number of local maxima andminima in a signal. It is similar to the number of zero-crossingsthat indirectly represents the frequency measurement of thesignal.

.2. Frequency-domain features (FDFs)

Frequency domain analysis is also crucial, since a frequencyepresentation of an EEG signal provides some useful informationbout patterns in the signal. The PSD and the normalized PSD (byhe total power) are mostly used to extract features that repre-ent the power partition at each frequency. This section describeseatures that are extracted from the PSD.

The energy is extracted from some specified frequency ranges,usually corresponding to normal EEG activities, to determine theEEG rhythmicity in each frequency range.

Intensity weighted mean frequency (IWMF), also known as themean frequency, gives the mean of the frequency distribution

using the normalized PSD, and is defined as

IWMF (X) =∑k

x[k]f [k], (22)

ical Signal Processing and Control 57 (2020) 101702

where x[k] is the normalized PSD of an EEG epoch at the fre-quency f[k].

3 Intensity weighted bandwidth (IWBW), or the SD frequency, is ameasure of the PSD width in terms of the SD, and is defined as

IWBW (X) =√∑

k

x [k] (f [k] − IMWF (X))2, (23)

where x[k] is the normalized PSD. According to the seizure pat-terns in typical EEG signals, the PSD is sharper during seizureactivities. Therefore, the IWBW is smaller during those activities.

4 Spectral edge frequency (SEF), denoted as SEF˛, is the frequencyat which the total power of the normalized PSD, x[k], is equalto percent. We define SEF as the frequency f[kSEF] such that∑kSEF

k=0x[k] = 0.01˛. By this definition, the median frequency is arelated feature, defined as the frequency at which the total poweris equally separated and is, therefore, equal to the SEF50.

5 Spectral entropy (SE) is a measure of the random process uncer-tainty from the frequency distribution. A low SE value means thefrequency distribution is intense in some frequency bands. Its cal-culation is similar to that for the ShEn but replaces the probabilitydistribution with the normalized PSD as follows:

SE(X) = −∑k

x[k] log x[k]. (24)

6 Peak frequency, also called dominant frequency, is the frequencyat which the PSD of the highest average power in its full-width-half-max (FWHM) band has the highest magnitude. Since thepeak frequencies of normal and seizure are located differently,it can be used to differentiate the two events.

7 Bandwidth of the dominant frequency is defined as the FWHMband corresponding to the peak frequency. Regarding the rhyth-micity of an EEG signal, it can be implied that, during seizureactivities, there are frequency ranges that are more intense.Hence, the bandwidth is supposedly small during the seizureactivities.

8 Power ratio of the current and background epochs at the same fre-quency range is determined to compare the powers. The power ofa seizure period is relatively higher than that of the background.

2.3. Time-frequency-domain features (TFDFs)

Since an seizure occurrence can be captured by both TDFsand FDFs, many transformation and decomposition techniquesthat give information in terms of time and frequency have beenwidely considered, e.g., STFT spectrogram and DWT analyses. In thissection, all the features are computed from the results of decompo-sition techniques. For example, when the DWT was used, featureswere calculated from the decomposition coefficients of some spe-cific levels.

1 Statistical parameters are also common in the time-frequencydomain. Like TDFs, these attributes are computed to distinguishstatistical characteristics of normal and seizure EEG signals ineach specified frequency sub-band. Features applied in previousstudies included the mean, absolute mean, variance, SD, skewness,kurtosis, absolute median, and minimum and maximum values.

2 The energy, average power, and RMS are used to observe the ampli-tudes of signals corresponding to specific frequency sub-bands.

3 The line length is used to estimate the fractal dimension of theDWT decomposition coefficients.

4 The ShEn, ApEn, PE, and WPE, entropy-based indicators, deter-

mine the uncertainties and complexities of decomposed signals.

5 Weighted multi-scale Rényi permutation entropy (WMRPE)is an extension of multi-scale Rényi permutation entropymeasuring the time-series complexity [48]. Considering a

Page 7: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

iomed

6

7

2

cmtttQllcvttlpsf

oa[FOpcu[m(iaitOc

npic

P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / B

coarse-grained signal of x[n]: v[i] = (1/l)∑il

j=(i−1)l+1x[j], a multi-scale template vector of length m is denoted as u[i] =[

v[i] v[i + ] · · · v[i + (m − 1)]]T

, where is a time delay.The probability p(�k) of each permutation pattern �k is thendefined as in (8) and the WMRPE is finally calculated as

WMRPE(X) = 11 − q

log

(m!∑k=1

p(�k)

)q

, (25)

when q /= 1. If q = 1, the entropy eventually changes to weightedmulti-scale permutation entropy (WMPE) as computed in (9).

Additionally, HE and DH are applied to demonstrate randomnessin a time series corresponding to the DWT coefficients of eachsub-band.

Autoregressive (AR) and autoregressive-moving-average (ARMA)model parameters of a time series model are fitted to DWT coef-ficient sequences. It is assumed that the time-series modelsexplaining the dynamics of ictal and normal EEG patterns aredifferent. Time-series model parameters are then estimated foreach sub-band where stationarity assumption holds. However, itis claimed that ARMA parameters were significant features onlyin some sub-bands [49].

.4. Complexity of feature extraction

Suppose a raw EEG time series is split into epochs; each of whichontains N time samples that are extracted into a feature. Based onathematical expressions of each feature, we describe its computa-

ional complexity as a function of N samples. Firstly, simple featureshat require only O(N) are standard statistical parameters, excepthose which need a prior sorting process, including mode, median,1, Q3, and IQR, which require O(N log N), energy, NE, RMS, line

ength, Hjorth parameters, number of zero-crossing, and number ofocal extrema. The HE, DFA, and fractal dimension features involvealculating a partial time series and solving a regression with oneariable, so the first task is the more dominant. The complexity ofhese features is still linear in N, O(N). Similarly, the main calcula-ion of LBP, LNDP, and LGP are from computing partial time series ofength m. Since the computational complexity of computing eachartial time series is linear in m and there exists N − m partial timeeries where m « N, then the total complexity for calculation of theseeatures is O(N), or O(Nm) (to be more precise).

Complexities of entropy-related features differ in the meth-ds of approximating the probability distribution (a bubble sortlgorithm requires O(N2), while a quicksort takes O(N log N)50]). For exmaple, the ShEn needs O(N log N). Also, the ApEn,uzzyEn, DistEn, and SampEn normally require complexities of(N2) because there are approximately N(N − 1)/2 similarity com-

arisons between two template vectors [34,51]. Additionally, someomplexities of entropy calculation can be reduced to O(N log N)sing a sorting algorithm before computing the self-similarity52,53]. Moreover, the PE and WPE require comparison of m! per-

utation patterns and a template vector of length m. So there existm!)(N − m + 1) comparisons, making the computational complex-ty O((m!)N). On the other hand, the SVDEn needs SVD computationt the beginning followed by the entropy calculation. The complex-ty of calculating SVD is O(m2N + m3), where m is the number ofhe singular values and m < N and the entropy calculation requires(m). Therefore, the complexity of SVDEn is dominated by the SVD

omputation of O(m2N + m3).Computing FDFs firstly requires the Fourier transform that

eeds O(N log N) for the fast-Fourier transform (FFT). Subsequentrocesses in calculating energy, IWMF, IWBW, SEF are just linear

n N, so the complexity of the FDFs is O(N log N). Computationalomplexities of TFDFs can be dominated by two processes: signal

ical Signal Processing and Control 57 (2020) 101702 7

decomposition and feature extraction, depending on which stephas a higher complexity. For example, the DWT has the complexityof O(N), while the STFT has the complexity of O(N log m), wherem is a size of the moving window. If the energy feature is calcu-lated from STFT, then the overall complexity is O(N log m). On theother hand, the total complexity is O(N log N) when the ApEn iscalculated from DWT coefficients.

3. Epileptic seizure detection

3.1. Performance metrics

The method to assess AESD is critical for comparing detectionalgorithms. As a binary classification, seizure detection results froma classifier are comprised of true positive (TP), false positive (FP),false negative (FN), and true negative (TN). These quantities can beformed into a true positive rate (TPR) and a false positive rate (FPR),or as sensitivity (Sen), specificity (Spec) and accuracy (Acc) [54].The accuracy is the overall classification performance, where thesensitivity and specificity are regarded as how well the algorithmdetects seizure and normal outcomes, respectively. Currently, thereare two sets of evaluation metrics in the AESD: epoch-based andevent-based metrics [55].

Epoch-based metrics are indicators used to evaluate the detec-tion performance when each epoch segmented from a long EEGsignal is supposed to be a separate data sample. The epoch-basedmetrics are calculated from the numbers of TP, FP, FN, and TNevaluated on all samples. For example, accuracy, sensitivity, andspecificity have been reported in many studies [3,7,10]. Neverthe-less, these epoch-based metrics can lead to wrong diagnosis whenseizure events are incorrectly recognized. For instance, an algo-rithm wrongly classifying one totally short seizure event as normalstill achieves a high epoch-based performance since there are manyother epochs that are correctly detected.

Event-based metrics, on the other hand, are used to evaluatea classifier based on seizure events in EEG signals. Two commonmetrics, good detection rate (GDR) and FPR per hour (FPR/h), arecalculated based on the intersection of detection results and anno-tations [56–58]. Here, GDR is defined as the percentage of detectedseizure events that have an overlap with the annotations andFPR/h is the proportion of events declared as a seizure withoutany intersection with the annotations in one hour. A higher GDRindicates a higher number of correctly detected seizure events,while a small FPR/h refers to having a lower number of wronglyrecognized seizure events. However, care is required with thesehigh event-based metrics to avoid being misled into a conclu-sion of a correct detection when a duration is considered. Forexample, declaring an occurrence of seizure at the last second ofan actual seizure event is still counted as good detection eventhough the detection system nearly misses the whole seizureevent.

3.2. Selected features used in epileptic seizure detection

From the lists of common features described in Section 2, thissection summarizes how those features were applied to AESD inthe literature. Performances of the detection algorithms generallydepend on both feature selection and the classifier, so we reportthe classification method, features and categories by domains, andthe performance results in Tables 1 and 2 for studies using the CHB-MIT Scalp EEG database [59] and in Tables 3 and 4 for those usingthe Bonn University database [60]. The description of the CHB-MIT

Scalp EEG database is explained in Section 5. For the Bonn Univer-sity database, there are five sets of data, each of which contains100 single-channel EEG samples. Only one data set consisted ofseizure activities recorded intracranially from epileptic patients,
Page 8: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

8 P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / Biomedical Signal Processing and Control 57 (2020) 101702

Table 1Summary of automated epileptic seizure detection using the CHB-MIT Scalp EEG database when single-domain features were used.

Domain Features Method Performance Ref.

Time Raw signal ANN Acc = 100% [26]aEEG Thresholding Sen = 88.50%, FPR/h = 0.18 [58]Line length, NE, variance, average power, max RBF SVM Acc = 95.17%, Sen = 66.35%, Spec = 96.91% [61]Absolute mean values, average power, SD, ratioof absolute mean values, skewness, kurtosis

MSPCA + EMD + RF Acc = 96.90% [23]

MSPCA + EMD + SVM Acc = 97.50% [23]MSPCA + EMD + ANN Acc = 96.90% [23]MSPCA + EMD + k-NN Acc = 94.90% [23]

DB RVM Sen = 97.00%, FPR/h = 0.24 [56]Frequency Energy RBF SVM Sen = 96.00%, FPR/h = 0.08 [57]*Time-frequency Spectrogram STFT + SSDA Acc = 93.82% [62]

Mean, ratio of variance, SD, skewness, kurtosis,mean frequency, peak frequency

DWT + ELM Acc = 94.83% [24]

Log of variance DWT + thresholding Acc = 93.24%, Sen = 83.34%, Spec = 93.53% [63]DWT + RBF SVM Acc = 96.87%, Sen = 72.99%, Spec = 98.13% [61]

Absolute mean, average power, SD, ratio ofabsolute mean, skewness, kurtosis

MSPCA1 + DWT + RF Acc = 100% [23]

MSPCA + DWT + SVM Acc = 100% [23]MSPCA + DWT + ANN Acc = 100% [23]MSPCA + DWT + k-NN Acc = 100% [23]MSPCA + WPD2 + RF Acc = 100% [23]MSPCA + WPD + SVM Acc = 100% [23]MSPCA + WPD + ANN Acc = 100% [23]MSPCA + WPD + k-NN Acc = 100% [23]

Energy HWPT3 + RVM Sen = 97.00%, FPR/h = 0.25 [56]Energy, DB HWPT + RVM Sen = 97.00%, FPR/h = 0.10 [56]

1 Multi-scale principal component analysis, 2 wavelet packet decomposition,3 harmonic wavelet packet transform. * Use all data records.

Table 2Summary of automated epileptic seizure detection using the CHB-MIT Scalp EEG database when multi-domain features were used.

Time Frequency Time-frequency Method Performance Ref.

Variance, RMS, skewness, kurtosis, SampEn Peak frequency,median frequency

LDA Sen = 70.00%, Spec = 83.00% [64]

QDA Sen = 65.00%, Spec = 92.00% [64]Polynomial classifier Sen = 70.00%, Spec = 83.00% [64]Logistic regression Sen = 79.00%, Spec = 86.00% [64]k-NN Sen = 84.00%, Spec = 85.00% [64]

wMsottwfintc

3

TsUfiusftp[p

hereas the others were regarded as non-epileptic EEG signals.ore data description is available in [60]. We have separated these

ummaries according to the databases so that a fair comparisonf performance results can be concluded. However, even thoughhese studies tested the methods on the same database, the selec-ion of data records in each paper can be varied or this informationas not available. In this case, the interpretation of performancegures should be performed with care, and we have put a foot-ote to such publications. The methods are described as a series ofransformations or pre-processing steps (if performed) followed bylassifiers.

.2.1. Time-domain features (TDFs)Previous studies using TDFs in both databases are depicted in

ables 1 and 3. The most obvious approach was to use the raw EEGignals as inputs but a deep learning process to extract the patterns.sing the raw signal with ANN [26] and one-dimensional CNN

ollowing the z-score normalization process [65] provided promis-ng results. Similarly, proposed studies of CNN-based classificationsing raw signals and IMFs from EMD achieved nearly perfect clas-ification [66]. However, data used in [26] were only chosen fromemale patients, where the records contained epileptic activities in

he frontal lobe and the activities were only simple and complexartial epilepsy. A thresholding technique using aEEG was used in58], where the threshold could be simultaneously updated. Theroposed method effectively detected a high-amplitude seizure,

DT Sen = 78.00%, Spec = 80.00% [64]Parzen classifier Sen = 61.00%, Spec = 86.00% [64]SVM Sen = 79.00%, Spec = 86.00% [64]

but it also responded to high-amplitude artifacts. Thus, it is unsuit-able for detecting a seizure when the EEG signals are contaminatedwith artifacts. Another disadvantage of the detection algorithmwas that it required the EEG signal to be normal at the beginningand ictal patterns must be adequately long. Hence, only 85 seizureevents were left for evaluation. When a single feature was extractedto detect epilepsy, amplitude or uncertainty-related features werecommonly used with a ML technique. Results of using LBP withk-nearest neighbor (k-NN) showed that histograms between nor-mal and epileptic patterns were different [5]. Similarly, LNDP andLGP were individually applied to k-NN, SVM, decision tree (DT),and ANN to distinguish their patterns in epileptic EEG signals [43].As a result, both LNDP and LGP achieved capabilities of analyzingseizures for any classifiers while k-NN had the fastest computation.In addition, PE was employed to distinguish seizures from EEG sig-nals where it was concluded that PE from a normal EEG signal washigher than that from a seizure EEG signal [4]. Furthermore, WPE,which was calculated from sub-segments of one-long epoch andthen concatenated into a feature vector, was used with three clas-sifiers: linear SVM, radial basis function kernel SVM (RBF SVM),and ANN to find the performance of each classifier [67]. CombiningWPE and RBF SVM outperformed the other two classifiers. Unlike

the ML approach, the results of using a thresholding method withApEn as a single feature showed that the seizure period obtainedlower ApEn than the normal period [68]. An error energy obtainedfrom a linear predictive filter was used in a thresholding method in
Page 9: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / Biomedical Signal Processing and Control 57 (2020) 101702 9

Table 3Summary of automated epileptic seizure detection using the Bonn University database when single-domain features were used.

Domain Features Method Performance Ref.

Time Raw signal CNN Acc = 88.70%, Sen = 95.00%, Spec = 90.00% [65]Acc = 98.80% [66]

IMFs EMD + CNN Acc = 98.80% [66]Linear prediction error energy Thresholding Sen = 92.00%, Spec = 94.50% [69]LBP k-NN Acc = 99.33% [5]LNDP ANN Acc = 98.72%, Sen = 98.30%, Spec = 98.82% [43]1D-LGP ANN Acc = 98.65%, Sen = 98.44%, Spec = 98.70% [43]FuzzyEn, DistEn QDA Acc = 92.80%, Sen = 90.67%, Spec = 92.80% [70]ApEn Thresholding Acc = 73.00% [68]WPE Linear SVM Acc = 91.63% [67]

RBF SVM Acc = 93.38% [67]ANN Acc = 91.86% [67]

BAM, BFM EMD + Wavelet SVM Acc = 99.75%, Sen = 100%, Spec = 99.69% [45]Frequency PSD DT Acc = 98.72%, Sen = 99.40%, Spec = 99.31% [73]

Coefficients DFT + CNN Acc = 99.40% [66]Peak amplitude, peak frequency PSD (ARMA) + GMM Sen = 70.00%, Spec = 53.33% [74]

PSD (ARMA) + ANN Sen = 91.67%, Spec = 83.33% [74]PSD (ARMA) + RBF SVM Sen = 96.67%, Spec = 86.67% [74]PSD (YW) + GMM Sen = 78.33%, Spec = 83.33% [74]PSD (YW) + ANN Sen = 98.33%, Spec = 90.00% [74]PSD (YW) + RBF SVM Sen = 96.67%, Spec = 86.67% [74]PSD (Burg) + GMM Sen = 75.00%, Spec = 93.33% [74]PSD (Burg) + ANN Sen = 98.33%, Spec = 96.67% [74]PSD (Burg) + RBF SVM Sen = 98.33%, Spec = 96.67% [74]

Time-frequency log of Fourier spectrum at scale 4 and 5 DT-CWT4 + k-NN Acc = 100% [75]Coefficients DWT + CNN Acc = 99.20% [66]Mean, variance, skewness, kurtosis, entropy EWT5 + RBF SVM Acc = 100% [25]Absolute coefficients, absolute mean, absolutemedian, absolute SD, absolute max, absolute min

RDSTFT6 + ANN Acc = 99.80%, Sen = 99.90%, Spec = 99.60% [76]

Absolute mean, average power, SD DWT + PCA + RBF SVM Acc = 98.75%, Sen = 99.00%, Spec = 98.50% [77]DWT + LDA + RBF SVM Acc = 100%, Sen = 100%, Spec = 100% [77]DWT + ICA + RBF SVM Acc = 99.50%, Sen = 99.00%, Spec = 100% [77]

Absolute max, absolute min, absolute mean, SD WPD + k-NN Acc = 99.45% [78]Line length DWT + ANN Acc = 97.77% [7]Fractional energy SPWVD7 + ANN Acc = 99.92% [79]ApEn MWT8 (GHM) + ANN Acc = 96.69%, Sen = 98.62%, Spec = 89.91% [80]

MWT (CL) + ANN Acc = 95.15%, Sen = 96.57%, Spec = 89.21% [80]MWT (SA4) + ANN Acc = 98.27%, Sen = 99.00%, Spec = 95.50% [80]DWT + Thresholding Acc = 96.00% [68]

WPE DWT + Linear SVM Acc = 86.50% [67]DWT + RBF SVM Acc = 88.25% [67]DWT + ANN Acc = 86.63% [67]

WMRPE FBS9 + DT Acc = 98.60% [81]HE, ARMA parameters DCT + RBF SVM Acc = 97.79%, Sen = 97.97%, Spec = 97.60% [49]HE, fractal dimension, PE DT-CWT + RBF SVM Acc = 98.87%, Sen = 98.20%, Spec = 100% [82]

DT-CWT + k-NN Acc = 97.80%, Sen = 97.20%, Spec = 98.60% [82]DT-CWT + DT Acc = 90.33%, Sen = 90.00%, Spec = 94.00% [82]DT-CWT + RF Acc = 98.13%, Sen = 98.20%, Spec = 98.40% [82]

4 Dual-tree complex wavelet transform, 5 empirical wavelet transform, 6 rational discrete STFT, 7 smoothed pseudo Wigner-Ville distribution, 8 multiwavelet transform, 9

Fourier-Bessel series. Note that all publications used every set of the data but none of them explained which segments were selected.

Table 4Summary of automated epileptic seizure detection using the Bonn University database when multi-domain features were used.

Time Frequency Time-frequency Method Performance Ref.

Raw signal, IMFs Coefficients Coefficients EMD + FFT + DWT + CNN Acc = 99.50% [66]Mean, energy, SD, max value Mean, energy, SD, max value DWT + HHT + NNE Acc = 98.78% [83]

DWT + HHT + ANN Acc = 88.00% [83]DWT + HHT + RNN Acc = 91.33% [83]DWT + HHT + SVM Acc = 94.67% [83]DWT + HHT + k-NN Acc = 95.33% [83]DWT + HHT + LDA Acc = 92.67% [83]

Max, min, mean, SD, kurtosis, skewness, Q1,Q3, IQR, median, mode, mobility, complexity,SampEn, HE, DFA

Max, min, mean, SD, ShEn DWT + RF Acc = 97.40%, Sen = 97.40%,Spec = 97.50%

[84]

D Energy HWPT + RVM Acc = 99.80%, Sen = 100%, [56]

N ich se

[n

t

B

ote that all publications used every set of the data but none of them explained wh

69], where the epileptic group had a higher error energy than theormal group.

On the other hand, making use of several TDFs applied to MLechniques was found in [23,70,45] (Tables 1 and 3). In [23], statisti-

Spec = 99.00%

gments were selected.

cal features, including absolute mean, SD, skewness, kurtosis, ratioof absolute mean, and average power of each IMF obtained fromEMD, were computed following an artifact removal using a multi-scale principal component analysis (MSPCA), and then passed to

Page 10: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

1 iomed

maanioatrtst(tcTttpTmwwawwe[f

3

dFDstcaomupSatuact

juqt[le

3

bFnsa

0 P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / B

any classifiers, including SVM, ANN and k-NN. Applying FuzzEnnd DistEn in [70] to a multi-length segmentation approach and

quadratic discriminant analysis (QDA) showed that this combi-ation could identify seizures. However, their performance was

nferior to that in [43] which used only single features with ANNr k-NN. Furthermore, BAM and BFM computed from each IMF werenalyzed independently and fed to SVMs with many kernel func-ions including RBF and wavelet kernel functions [45]. The authorseported that using Morlet wavelet with those features could effec-ively separate seizure epochs from normal epochs with accuracy,ensitivity, and specificity of 99.75%, 100%, and 99.69%, respec-ively. Alternatively, among pioneer works in epilepsy detectionnot listed in the tables as the data were from different sources),here were studies considering a group of features applied to ariterion-based classifier as opposed to ML approaches [6,71,72].he average amplitude relative to the background, average dura-ion, and CV of durations of half-wave transformation were appliedo detect epilepsy [6]. A half-wave established in [71] was pro-osed to determine the amplitude and the duration of the wave.he results showed that the CV should be small, indicating aeasure of rhythmicity of the EEG signal, the average durationas limited in some specific range, and the average amplitudeas high. Similarly, after the half-wave transformation, duration,

mplitude, and sharpness (defined by the slope of the half-wave)ere extracted and then compared to its background activity [72],hich revealed that these features were in a specific range during

pileptic seizures. However, the specific range was not provided in72]. Energies from three IMFs performed by a thresholding methodrom the epilepsy period were higher compared to the normal [8].

.2.2. Frequency-domain features (FDFs)From Tables 1 and 3, commonly-used features in the frequency

omain were PSD, peak amplitude, peak frequency, and energy.rom the advantages of deep learning, coefficients computed fromFT and a CNN model were used to detect seizures [66]. The clas-

ification accuracy of this approach was 99.40% accuracy. In [73],he combination of PSD based on the Welch method and DT as alassifier achieved a high performance, both in terms of sensitivitynd specificity. The first four local extrema and their frequenciesbtained from the PSD of the signal computed by several PSD esti-ation methods (ARMA, Yule-Walker equation, and Burg) were

sed to identify epileptiform activities [74]. These features com-uted from any estimated PSD performed well with ANN and RBFVM but the performance dropped when using these features with

Gaussian mixture model (GMM). By means of filter bank analysis,he energy obtained from each frequency range of three consec-tive epochs was concatenated into a feature vector for temporalnd spectral information, and then applied to RBF SVM for classifi-ation [57]. As a result, the energies from the seizure group tendedo be higher than that of the normal group.

Furthermore, some features in the frequency domain were usedointly to detect epilepsy. Both criteria-based and ML methods weresed in the AESD. The peak frequency, bandwidth of the peak fre-uency, and power ratio of the current and background epochs inhe same frequency range were applied to the AESD in newborns85]. These features with appropriate thresholds could identify aarge part of seizures but still incorrectly classified some seizurevents, such as slow or spike waves.

.2.3. Time-frequency-domain features (TFDFs)Common transformed signals still containing information of

oth time and frequency domains were analyzed by STFT or DWT.

irstly, we describe the literature that regards the transformed sig-als as raw features. By using a deep learning approach, the STFTpectrogram of a raw EEG signal was considered as a feature with

modified stacked sparse denoising autoencoder (mSSDA) as a

ical Signal Processing and Control 57 (2020) 101702

classifier [62]. This combination was then compared to the otherclassifiers, where the result showed that the mSSDA could success-fully distinguish epilepsy from normality. Similarly, the logarithmsof the Fourier spectrum of dual-tree complex transform (DT-CWT)coefficients at scales 4 and 5 were performed with k-NN for detect-ing epilepsy [75]. This approach performed well with an accuracyof 100% and could be implemented in real time because of its lowcomputational complexity, consuming 14.4 ms for processing.

The other group of previous studies further extracted fea-tures from STFT or DWT sequences. Amplitude and uncertaintymeasurements were commonly used individually, whereas statis-tical parameters were jointly employed in general. Secondly, wedescribe studies that applied a single feature on transformed sig-nals. A logarithm of variance of each DWT sub-band coefficientsfrom a chosen channel were used independently with a thresh-old [63] to detect patient-specific epileptic seizure. The use ofthis feature obtained outstanding average performances of 93.24%accuracy, 83.34% sensitivity, and 93.53% specificity from the bestclassification result from each chosen patient. The author alsoapplied combinations of those features to SVM and thoroughlycompared the results with TDFs [61]. The TDFs were line length,NE, variance, power, and maximum value of raw EEG signals. As aresult, the best average detection performances from each patientusing SVM with wavelet-based features were average accuracy of96.87%, sensitivity of 72.99%, and specificity of 98.13%. On the otherhand, the best average outcomes using TDFs were 95.17% accu-racy, 66.35% sensitivity, and 96.91% specificity, respectively. Theenergy calculated from each transformation was commonly usedin seizure detection. The energy from each DWT sub-band coef-ficients with RBF SVM was used to present the spectrum in eachsub-band [86]. Similarly, fractional energies, energies in specificfrequency bands and time windows of a Smoothed pseudo WVD(SPWVD) were applied to principal component analysis (PCA) andANN, resulting in a high average accuracy of 95.00% [79].

Furthermore, energies from Cohen’s class transformation,including SPWVD, were employed to analyze EEG signals [87]. As aresult, the energy computed from any transformation was success-fully able to distinguish epileptic EEG from normal EEG. The linelength calculated from DWT coefficients with ANN could deter-mine seizures properly [7]. For the uncertainty-related features,the ApEn, calculated from DWT coefficients, was also applied toa criteria-based detection system [68] and the ApEn computedfrom multiwavelet transform (MWT) was used with ANN [80] inthe AESD. Three famous multiwavelets, namely Gernoimo-Hardin-Massopust (GHM), Chui-Lian (CL), and SA4, were exploited. Bothapproaches achieved an accuracy of 95-98%. In addition, as a resultin [68], the ApEn had small values in a seizure group and usingDWT could improve and provide some useful information for theAESD. When WPE was extracted from the DWT coefficients in [67],the work tested the methods of ANN, linear SVM, and RBF SVM,and the results indicated that RBF SVM could surpass the two otherclassifiers, but with lower accuracies than those of [80] evaluatedon the same data set. Moreover, WMRPE computed from Fourier-Bessel series (FBS) coefficients corresponding to five normal EEGrhythms was used to analyze epileptic seizures using three classi-fiers, namely, RF, SVM, and DT [81]. The authors reported that usingthe WMRPE with the DT achieved the highest accuracy of 98.60%.

Thirdly, we discuss TFDFs that were applied jointly. In thiscase, statistical parameters and amplitude-related measures werecommonly used. The absolute mean, SD, average power of DWTcoefficients of each level, and ratio of absolute mean values of theadjacent sub-bands were combined and applied to the dimension-

ality reduction techniques of PCA, independent component analysis(ICA), and linear discriminant analysis (LDA) to extract useful fea-tures and reject meaningless features [77]. The dimension-reducedfeature vector was classified by RBF SVM and the results showed
Page 11: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

iomed

taocas(cpetsepowa2sntcwlvita

tiiHmffetBmwAsc

3

adtwaSefTsnmrwfa

uS

P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / B

hat this feature-based approach could firmly detect seizure withn accuracy of 98-100%. The mean, SD, skewness, kurtosis, and ratiof variance of the adjacent sub-bands computed from the DWToefficients of every decomposition level, and mean frequencynd peak frequency calculated from the PSD of each decompo-ition level were cooperated with an extreme learning machineELM) to classify an epileptic EEG signal [24]. Due to the effi-iency of computation in the ELM [88], this combination coulderform the detection accurately and quickly in a short time. Nev-rtheless, a total of 20 recordings from three patients was usedo evaluate the classification performance. The absolute mean, SD,kewness, kurtosis, ratio of absolute mean, and average power ofach decomposition level were applied with DWT and waveletacket decomposition (WPD) in the AESD [23]. A noise reductionf the multi-channel EEG signals was done by the MSPCA andhen using this with the RF, SVM, ANN, and k-NN classifiers, it

chieved an accuracy of 100%. However, the authors only used,000 8-second EEG segments, 1,000 segments for each class, asamples in the experiments. Moreover, the mean, variance, skew-ess, kurtosis, and ShEn of each sub-band in the empirical waveletransform (EWT) were extracted and the use of these featuresould potentially detect focal epilepsy automatically by RBF SVMith a reported accuracy of 100% [25]. The absolute mean, abso-

ute median, absolute maximum, absolute minimum, and absolutealues of coefficients from Rational discrete STFT (RDSTFT) werenvestigated with diverse classifiers [76]. The results indicated thathe ANN was the most optimal classifier, and the absolute mediannd some absolute coefficients were the most dominant features.

In addition to statistical parameters, combinations of uncer-ainty and similarity measurements were also established tondicate a periodic pattern of seizure. The use of TFDFs includ-ng energy and DB from improved eigenvalue decomposition ofankel matrix and Hilbert transform potentially distinguished seg-ents of seizure and seizure-free [89]. The PE, HE, and DH extracted

rom some DT-CWT sub-band coefficients were used as the featuresor RBF SVM, k-NN, DT and RF analyses [82]. These features fromach decomposition level could separate the seizure activities fromhe normal group. The HE estimated from discrete-time fractionalrownian motion process and discrete-time fractional Gaussianotion process and ARMA parameters from multi-rate filter bankere computed and applied to the RBF SVM to detect seizure [49].s a result, the HE of each sub-band was different according to thetationarity of each sub-band, and the HE and the ARMA parametersomputed from some sub-band coefficients were significant.

.2.4. Multi-domain featuresIn terms of the complementary information that cannot be

chieved in one domain, some studies have used features fromifferent domains combined into a feature vector. Table 4 showshat features from the time domain and time-frequency domainere commonly used together since the time-frequency domain

lso provides spectral information. A combination of the mean,D, energy, and maximum value from EEG signal and from thenvelope spectrum of some DWT sub-bands were also formed aseatures and applied to a neural network ensemble (NNE) [83].he authors reported that features extracted from the envelopepectrum of the decomposition levels of the seizure group wereormally higher than in the normal group [83]. The mean, SD, maxi-um and minimum values of EEG signal combined with the energy

atio of each detail and approximation coefficients from the DWTere employed with an Ant Colony classifier [90]. As a result, these

eatures were typically higher in epileptic seizure periods, which

greed with previous studies.

The mean, SD, skewness, kurtosis, maximum and minimum val-es, median, mode, Q1, Q3, IQR, mobility, complexity, HE, DFA,hEn, and SampEn were computed from an EEG signal, and the

ical Signal Processing and Control 57 (2020) 101702 11

mean, SD, maximum and minimum values were obtained fromeach sub-band of DWT [84]. Subsequently, a novel feature selection,the improved correlation-based feature selection, was proposed toselect significant features based on their correlation and then theselected features were classified by the RF. The results showed thatthe feature dimension was extremely reduced and the remainingfeatures could improve the detection system compared to featuresselected by the CFS. The variance, skewness, kurtosis, RMS, andSampEn calculated from each sub-band and complete bandwidthbased on the second order Butterworth filter, and peak frequencyand median frequency computed from the PSD of the bands werefed to several feature selection techniques and classifiers to findthe best combination [64]; see performance in Table 2. From thebest results reported in the article, the top five uncorrelated fea-tures via LDA a with backward search, a feature selection method,in each region were mostly the RMSs of some frequency range.Moreover, compared to several classifiers, k-NN performed the bestwith this approach. However, the authors selected only recordscontaining seizure activities to perform the experiments. In [56],the DB from raw EEG signals and energies from each Harmonicwavelet packet transform (HWPT) sub-bands used with a relevancevector machine (RVM) could be successfully applied in real-timesettings.

On the other hand, some literature also included features fromthree domains to obtain information from all different aspects. Adeep learning approach, namely, CNN, was exploited to extractfeatures and detect seizures [66]. Combining several transforma-tions, this method could successfully differentiate ictal and normalepochs with an accuracy of 99.50%. Twenty-one features fromdifferent domains, which were RMS, NE, line length, number ofzero-crossings, number of local extrema, three Hjorth parameters,ShEn, ApEn, SVDEn, and error from validating an AR model on othersegment obtained from an EEG signal, IWMF, IWBW, SEF90, thetotal power, peak frequency and bandwidth of the peak frequencycalculated from PSD, and energy extracted from a specific DWTsub-band that corresponded to 1.25–2.5 Hz, were tested with LDAto determine their abilities on neonatal seizure detection in the EEGsignal [91]. It showed that all the features combined achieved thebest outcome, although the RMS, number of local extrema, and linelength were three most significant features.

Based on our review, the first conclusive point is that a raw EEGsignal or coefficients of any transformation were typically com-bined with a deep learning technique. Most studies were in favorof a ML approach rather than a thresholding technique, since thethreshold of each feature can vary with subjects and baselines ofEEG signals in a normal period. Secondly, no statistical parame-ter was used individually in an AESD. In fact, statistical parameterswere always applied with other features that were related to ampli-tude or uncertainty measurement and with a ML technique. Thisimplies that statistical parameters alone do not attain sufficientinformation to distinguish the seizures from the normality. On theother hand, features, such as quantities of amplitude and similarity,were commonly used separately from other features. In particular,the energy was the most widely used feature that contributed tothe meaning of amplitude. Moreover, attributes of amplitude anduncertainty were also typically considered when being employedwith statistical parameters. This indicates that the amplitude-related features and similarity-uncertainty measurements have thecapability to distinguish the seizures from the normality in an EEGsignal.

4. Methods for feature evaluation

Our paper aims to examine the significance of a single featureto the classification performance through the Bayesian method.

Page 12: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

1 iomed

Hsdc

4

osmACid

e

wttret(

e

wm

ser

P

wtff

p

wKaesc

p

e

w

nt

rf

0)

2 P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / B

owever, multiple features are commonly applied in practice,o a redundancy analysis is also needed. Therefore, this sectionescribes the brief concepts about the Bayes classifier and theorrelation-based feature selection.

.1. Bayes error rate

Common feature selection and dimensionality reduction meth-ds in AESD [10,79,92] do not provide information about how aingle feature independently improves the classification perfor-ances. Hence, to determine the significance of each feature on the

ESD, we employed the Bayes classifier. If we define x as a feature,(x) as a class in which feature x is classified, and Ci denotes a class

labeled from the data, then a misclassification error is generallyefined as

rr =∫ ∑

Ci /= C(x)

P (Ci|x)p(x)dx, (26)

here P (Ci|x) is the posterior probability of x in class i, and p(x) ishe probability density function of x. Intuitively, we can interprethat the error is the total joint probability that the feature is incor-ectly classified. The Bayes optimal classifier gives the minimumrror (the Bayes error rate) by choosing the class of which the pos-erior probability is the highest [93]. As a result, the Bayes error rateerrb) is obtained from

rrb =∫ ∑

Ci /= Cmax

P (Ci|x)p(x)dx, (27)

here Cmax is the class of which the posterior probability is maxi-um.

Practically, the distribution of a likelihood function is unknown,o a non-parametric distribution estimation was exploited in ourxperiment and the posterior probability was obtained by Bayes’ule:

(Ci|x) = p(x|Ci)P(Ci)p(x)

, (28)

here p(x|Ci) is the likelihood function, P(Ci) is the prior distribu-ion, and p(x) is the evidence. For all real x values, the likelihoodunction can be estimated using the non-parametric kernel smoothunction

(x|Ci) = 1Nih

Ni∑j=1

K(x − xj

h

),

here Ni is the sample size of class Ci, xj is a sample in the class,(x) is the Gaussian kernel and h is a bandwidth [94,95]. Regarding

classification problem, the prior P (Ci) is assumed to be binomialstimated by the size of class Ci divided by the total number ofamples. Finally, p(x) is obtained by the total probability and itompletes the calculation of P (Ci|x) in (28).

Our problem was a two-class classification (normal/seizure)roblem, so the Bayesian error was reduced to

rrb =∫

mini=1,2P (Ci|x)p(x)dx, (29)

here C1 and C2 stand for the seizure and normal classes. For the

on-parametric kernel function, h ≈ 1.06 �N− 1

5i

was chosen to be

he optimal bandwidth, where � is the sample SD of the sample.

We have observed that using conventional performance met-ics, such as accuracy, cannot give the significance of each individualeature. Hence, to evaluate the performance of individual features,

ical Signal Processing and Control 57 (2020) 101702

we proposed to use an improvement rate (rate) from a standardcondition (err0) as follows:

err0 =∫ ∞

−∞P (C2|x)p(x)dx = P (C2) , rate = err0 − errb

err0× 100%.(3

4.2. Correlation-based feature selection (CFS)

The CFS introduced in [96] is a feature selection method basedon the hypothesis that a good subset of features is highly correlatedwith the class, but uncorrelated with others. For a feature subset Fcontaining k features, an index called heuristic merit is exploitedto measure feature-feature and feature-class correlations and isdefined by

MeritF = krfc√k + k(k − 1)rff

, (31)

where rfc is the mean value of the correlation of feature and class,and rff is the average of the feature-feature correlation. To find thesubset F with the highest merit score, we applied the CFS algo-rithm provided in [97], where the correlations are estimated byconditional entropy. The algorithm initially assigns the subset F tobe empty. New feature subsets are constructed by adding anotherfeature that has not been previously selected to F. All new subsetsare then evaluated and the subset having the best merit score isused as the subset F in the next iteration. This process is repeateduntil F has m features. The final subset F contains features rankedin the descending order by the merit score.

5. Experimental results

The experiment was performed on the public CHB-MIT ScalpEEG database [59] that contains 24 EEG recordings from 23patients: five males aged 3-22 years, 17 females aged 1.5-19 years,and one anonymous subject. All EEG data were collected at theChildren’s Hospital Boston and the international 10-20 system wasused to locate electrode placements. All signals were recorded witha sampling frequency of 256 Hz digitized at 16-bit resolution andstored in an EDF file [98]. This full database is available downloadedat PhysioNet (https://physionet.org/physiobank/database/chbmit/).

We randomly chose two records from each case, subject to theinclusion condition that every record must contain at least oneseizure activity, so the records for assessing the improvement rateof the Bayes error rate of each feature are shown in Table 5. Thechannels were sequentially listed as follows: FP1-F7, F7-T7, T7-P7,P7-O1, FP1-F3, F3-C3, C3-P3, P3-O1, FP2-F4, F4-C4, C4-P4, P4-O2,FP2-F8, F8-T8, T8-P8, and P8-O2.

An EEG epoch was defined by segmenting a raw EEG signal inevery channel with a 4-second width. The next consecutive epochwas segmented from the raw EEG signal by moving the windowfor 1 s. These choices were selected from inspecting the processingstep using the commercial Persyst [99] software. After the processof segmenting the raw EEG signals, a feature was then computedfrom each epoch and each channel independently. Only commonlyused features in the literature were selected in this experiment(Table 6). For the ApEn and SampEn, the template length, m, andthe tolerance, r, were set to m = 2 and r = 0.2SD, where SD was thesample SD of the segment. All TDFs were calculated from a raw EEGsignal, whereas features from the frequency domain were extractedfrom PSD and TFDFs were computed from DWT coefficients withthe Daubechies wavelet tap 4 for five levels.

The EEG channel selection is still an open research questionand using multi-channel EEG signals may be redundant. Moreover,some commercial software also analyzes the seizure activity overthe left and right sides of the brain. For these reasons, we have not

Page 13: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / Biomedical Signal Processing and Control 57 (2020) 101702 13

Table 5List of all records used to evaluate features.

Records

chb01 04 chb01 16 chb02 16+ chb02 19 chb03 03 chb03 35 chb04 08 chb04 28chb05 06 chb05 13 chb06 01 chb06 04 chb07 13 chb07 19 chb08 02 chb08 05chb09 06 chb09 08 chb10 38 chb10 89 chb11 92 chb11 99 chb12 33 chb12 38chb13 19 chb13 55 chb14 04 chb14 18chb17a 04 chb17b 63 chb18 32 chb18 35

chb21 21 chb21 22 chb22 21 chb22 22

Table 6List of features for the Bayesian error rate evaluation and redundancy analysis.

Domains Features

Time Mean, variance, CV, skewness, kurtosis, max, min, energy,NE, line length, ShEn, ApEn, SampEn, number ofzero-crossing, number of local extrema, mobility,complexity

Frequency IWMF, IWBW, SE, peak frequency, peak amplitudeTime-frequency Mean, absolute mean, variance, skewness, kurtosis, max,

min, energy, line length

Table 7Bayes error (errb) and improvement rate of time-domain and frequency-domainfeatures.

Feature Left side Right side

errb rate errb rate

(a) Time-domain features

Mean 0.0174 0.00 0.0174 0.05Variance 0.0160 8.20 0.0166 4.78CV 0.0174 0.00 0.0174 0.00Skewness 0.0174 0.00 0.0174 0.00Kurtosis 0.0174 0.08 0.0174 0.17Max 0.0174 0.00 0.0174 0.00Min 0.0174 0.00 0.0174 0.00Energy 0.0160 8.18 0.0166 4.77NE 0.0157 10.07 0.0160 8.36Line length 0.0166 1.92 0.0167 1.16ShEn 0.0174 0.33 0.0165 5.64ApEn 0.0174 0.00 0.0174 0.00SampEn 0.0174 0.00 0.0174 0.00Local extrema 0.0174 0.00 0.0174 0.00Zero-crossing 0.0174 0.09 0.0174 0.09Mobility 0.0174 0.00 0.0174 0.00Complexity 0.0174 0.00 0.0174 0.00(b) Frequency-domain features

IWMF 0.0174 0.00 0.0174 0.00IWBW 0.0174 0.00 0.0174 0.00SE 0.0174 0.00 0.0174 0.00

eaxat

5

naiTTteh

Peak amplitude 0.0174 0.00 0.0174 0.00Peak frequency 0.0174 0.00 0.0174 0.00

xplored the channel selection topic but rather used a spatial aver-ging of features over the left and right sides of the brain, denoted asleft and xright, respectively. We then estimated the posterior prob-bility distributions and computed the Bayes error rates of usinghe left and right feature representatives.

.1. Feature significance

From the chosen records, there were 263,424 samples in theormal group and 4,677 epochs belonging to the seizure group,nd so err0 = 4,677/(263,424 + 4,677) = 0.0174. The features with anmprovement rate higher than 4.5% were considered as significant.able 7 shows the Bayes error rate and improvement rate of each

DF and FDF. The results from this data set showed that most fea-ures achieved almost the same Bayes errors that were close torr0, except for the variance, energy, NE, and ShEn that obtainedigh improvement rates of 4.77% to 10.07%, where NE reached the

chb15 06 chb15 15 chb16 17 chb16 18chb19 29 chb19 30 chb20 14 chb20 16chb23 06 chb23 09 chb24 04 chb24 11

highest improvement rate. Figure 1 displays the improvement ratesof all TFDFs in each wavelet decomposition level. Note that D1,D2, D3, D4, D5, and A5 represent sub-bands from which the fea-tures are extracted. Overall, the results showed that the varianceand energy of the wavelet coefficients in all decomposition lev-els yielded relatively high improvement rates, compared to otherfeatures computed on the coefficients of other levels. Specifically,energy from level D1 of the left half brain accomplished the high-est improvement rate of 13.51%. Line length from levels D5 andA5, and kurtosis from D1 of the left hemisphere also achieved sig-nificant improvement rates. We found that the most significantfeatures were related to amplitudes and variations of the signals,such as variance, energy, and NE. Additionally, features that cancapture changes in amplitude, frequency, and rhythmicity gainsome improvement, since there is continuous evolution of ampli-tude, frequency, and rhythms during seizure activities compared tothe background [100]. On the other hand, FDFs do not help improvethe performance from the baseline because, in this data set, thereare artifacts causing the probability of the seizure occurrence to beless than that of the other class.

5.2. Feature redundancy analysis

The significant features achieving the improvement rate higherthan 4.5% (in total, 34 features) were then applied to the CFS algo-rithm to examine their redundancy. Figure 2 shows that the meritscore initially increases as m (the number of features in the sub-set) increases until the feature subset contains five features, witha maximum score of 1.25 × 10−2. As m increases beyond five, themerit score decreases.

In addition, the features and their improvement rates in the finalfeature subset were ranked in the descending order by the CFS asshown in Figure 3, where L and R stand for features computed fromthe left side and right side, respectively. The features in the opti-mal subset were EnergyD1R, VarianceD5L, VarianceD1L, EnergyD5L,and VarianceD1R. Accordingly, these five features also achieved rel-atively high improvement rates among the significant features. Asshown in Figure 2, the merit scores of the optimal feature subsetof each size were not that different. This indicated that the resultsof feature significance from our experiments also agree with theoutputs of the CFS. The results in [10] showed that the classifica-tion accuracy was not increased when other features were addedto the optimal feature subset. Moreover, these five features requirejust only O(N) for the total computation. Hence, we suggest thatthese five features should be at least used as features in the AESDbecause of their high improvement rates and low computationalcomplexity.

6. Conclusions

This paper aimed to review details of features in epileptic

seizure detection using EEG signals. We provided mathemati-cal descriptions and computations of features in detail to reduceinconsistencies of some complicated feature definitions that haveappeared in the literature. We also summarized feature usages
Page 14: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

14 P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / Biomedical Signal Processing and Control 57 (2020) 101702

Fig. 1. Improvement rate based on the Bayesian method of time-frequency

Fig. 2. Merit scores of feature subsets. The subset size achieving the highest meritscore is 5.

Fo

aautfn

[2] D. Thurman, E. Beghi, C. Begley, A. Berg, J. Buchhalter, D. Ding, D. Hesdorffer,

ig. 3. The features ranked by CFS and their improvement rates. All features in theptimal subset also obtained high improvement rates.

nd intuitive meanings for detecting seizures according to theirlgorithms. Based on our review, deep learning techniques weresed in combination with raw EEG signals or coefficients of a

ransformation, while shallow neural networks always requiredeatures as inputs. A single statistical parameter (as a feature) wasever used but combinations of them were applied. Such common

domain features calculated from DWT using Daubechies 4 wavelet.

combinations, including energy and entropy, were applied with aclassification from ML approaches, while other types of features,such as entropy and fractal dimension, had sometimes been usedindependently. It was evident that the feature most widely used toquantify a relation with amplitude was energy.

Furthermore, we conducted two experiments to verify the sig-nificance of each widely used feature from the time, frequency, andtime-frequency domains, and to analyze the redundancy of thosefeatures. Based on our experiment using the standard Bayes classi-fier, we concluded that variance, energy, NE, and ShEn extractedfrom raw EEG signals were useful for distinguishing betweenseizure and normal EEG signals, and could improve the detectionperformance from the baseline condition. However, other TDFs andFDFs gave insignificant outcomes for the detection of seizures. Fur-thermore, features extracted from the DWT coefficients from somespecific levels also provided intrinsically useful information aboutseizure that could not be achieved from raw EEG signals. Moreover,based on redundancy analysis, the top significant features selectedfrom CFS were the energy and variance from the DWT coefficientsof some decomposition levels. Therefore, the energy and variancecalculated from the DWT coefficients were typically recommendedto be used in both online and offline AESD due to their significanceand low computational complexity.

Acknowledgement

This work is financially supported by the 100th AnniversaryChulalongkorn University Fund for Doctoral Scholarship and the90th Anniversary of Chulalongkorn University Fund (Ratchada-phiseksomphot Endowment Fund).

Declaration of Competing Interest

The authors declare that they have no known competing finan-cial interests or personal relationships that could have appeared toinfluence the work reported in this paper.

References

[1] R. Fisher, C. Acevedo, A. Arzimanoglou, A. Bogacz, J. Cross, C. Elger, J.J. Engel,L. Forsgren, J. French, M. Glynn, et al., ILAE official report: a practical clinicaldefinition of epilepsy, Epilepsia 55 (4) (2014) 475–482.

W. Hauser, L. Kazis, R. Kobau, et al., Standards for epidemiologic studies andsurveillance of epilepsy, Epilepsia 52 (7) (2011) 2–26.

[3] U. Acharya, S. Sree, G. Swapna, R. Martis, J. Suri, Automated EEG analysis ofepilepsy: a review, Knowl-Based Syst. 45 (2013) 147–165.

Page 15: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

iomed

P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / B

[4] J. Li, J. Yan, X. Liu, G. Ouyang, Using permutation entropy to measure thechanges in EEG signals during absence seizures, Entropy 16 (6) (2014)3049–3061.

[5] T. Kumar, V. Kanhangad, R. Pachori, Classification of seizure and seizure-freeEEG signals using local binary patterns, Biomed. Signal Process. Control 15(2015) 33–40.

[6] J. Gotman, Automatic recognition of epileptic seizures in the EEG,Electroencephalogr. Clin. Neurophysiol. 54 (5) (1982) 530–540.

[7] L. Guo, D. Rivero, J. Dorado, J. Rabunal, A. Pazos, Automatic epileptic seizuredetection in EEGs based on line length feature and artificial neuralnetworks, J. Neurosci. Methods 191 (1) (2010) 101–109.

[8] L. Orosco, E. Laciar, A. Correa, A. Torres, J. Graffigna, An epileptic seizuresdetection algorithm based on the empirical mode decomposition of EEG,Proceedings of the Annual International Conference of the IEEE Engineeringin Medicine and Biology Society, IEEE (2009) 2651–2654.

[9] A. Temko, E. Thomas, W. Marnane, G. Lightbody, G. Boylan, EEG-basedneonatal seizure detection with support vector machines, Clin.Neurophysiol. 122 (3) (2011) 464–473.

[10] A. Aarabi, F. Wallois, R. Grebe, Automated neonatal seizure detection: amultistage classification system through feature selection based onrelevance and redundancy analysis, Clin. Neurophysiol. 117 (2) (2006)328–340.

[11] L. Yu, H. Liu, Efficient feature selection via analysis of relevance andredundancy, J. Mach. Learn. Res. 5 (Oct) (2004) 1205–1224.

[12] H. Hall, Correlation-based Feature Selection for Machine Learning, Ph.D.Thesis, University of Waikato Hamilton, New Zealand, 1999.

[13] I. Kononenko, E. Simec, M. Robnik-Sikonja, Overcoming the myopia ofinductive learning algorithms with RELIEFF, Appl. Intell. 7 (1) (1997) 39–55.

[14] U. Acharya, H. Fujita, V. Sudarshan, S. Bhat, J. Koh, Application of entropiesfor automated diagnosis of epilepsy using EEG signals: A review,Knowl-Based Syst. 88 (2015) 85–96.

[15] M. Ahmad, M. Saeed, S. Saleem, A. Kamboh, Seizure detection using EEG: asurvey of different techniques, Proceedings of the 2016 InternationalConference on Emerging Technologies, IEEE (2016) 1–6.

[16] U.R. Acharya, Y. Hagiwara, S.N. Deshpande, S. Suren, J.E.W. Koh, S.L. Oh, N.Arunkumar, E.J. Ciaccio, C.M. Lim, Characterization of focal EEG signals: areview, Future Gener. Comput. Syst. 91 (2018) 290–299.

[17] R.G. Andrzejak, K. Schindler, C. Rummel, Nonrandomness nonlineardependence, and nonstationarity of electroencephalographic recordingsfrom epilepsy patients, Phys. Rev. E 86 (4) (2012), 046206.

[18] T. Alotaiby, F. El-Samie, S. Alshebeili, I. Ahmad, A review of channel selectionalgorithms for EEG signal processing, EURASIP J. Adv. Signal Process. 2015(1) (2015) 66–86.

[19] O. Faust, Y. Hagiwara, T. Hong, O. Lih, U. Acharya, Deep learning forhealthcare applications based on physiological signals: a review, Comput.Methods Programs Biomed. 161 (2018) 1–13.

[20] B. Boashash, S. Ouelha, Designing high-resolution time-frequency andtime-scale distributions for the analysis and classification of non-stationarysignals: a tutorial review with a comparison of features performance, Digit.Signal Process. 77 (2018) 120–152.

[21] O. Faust, U.R. Acharya, H. Adeli, A. Adeli, Wavelet-based EEG processing forcomputer-aided seizure detection and epilepsy diagnosis, Seizure 26 (2015)56–64.

[22] E. Assi, D. Nguyen, S. Rihana, M. Sawan, Towards accurate prediction ofepileptic seizures: a review, Biomed. Signal Process. Control 34 (2017)144–157.

[23] E. Alickovic, J. Kevric, A. Subasi, Performance evaluation of empirical modedecomposition, discrete wavelet transform, and wavelet packeddecomposition for automated epileptic seizure detection and prediction,Biomed. Signal Process. Control 39 (2018) 94–102.

[24] S. Ammar, M. Senouci, Seizure detection with single-channel EEG usingextreme learning machine, Proceedings of the 17th InternationalConference on Sciences and Techniques of Automatic Control and ComputerEngineering, IEEE (2016) 776–779.

[25] S. Anand, S. Jaiswal, P. Ghosh, Automatic focal epileptic seizure detection inEEG signals, Proceedings of the 2017 IEEE International WIE Conference onElectrical and Computer Engineering, IEEE (2017) 103–107.

[26] S. Chakraborti, A. Choudhary, A. Singh, R. Kumar, A. Swetapadma, A machinelearning based method to detect epilepsy, Int. J. Inf. Technol. (2018) 1–7.

[27] N. Huang, Hilbert-Huang Transform and its Applications, vol. 16, World Sci,2014.

[28] R. Esteller, J. Echauz, T. Tcheng, B. Litt, B. Pless, Line length: an efficientfeature for seizure onset detection, Proceedings of the 23rd AnnualInternational Conference of the IEEE Engineering in Medicine and BiologySociety, Vol. 2, IEEE (2001) 1707–1710.

[29] L. Hellström-Westas, L. de Vries, I. Rosén, An Atlas of Amplitude-integratedEEGs in the Newborn, CRC Press, 2008.

[30] D. Maynard, P. Prior, D. Scott, Device for continuous monitoring of cerebralactivity in resuscitated patients, BMJ 4 (5682) (1969) 545–546.

[31] J. Kaiser, On a simple algorithm to calculate the energy of a signal,Proceedings of the International Conference on Acoustics, Speech and Signal

Processing, IEEE (1990) 381–384.

[32] C. Shannon, A mathematical theory of communication, Bell Syst. Tech. J. 27(3) (1948) 379–423.

[33] S. Pincus, Approximate entropy as a measure of system complexity, Proc.Natl. Acad. Sci. 88 (6) (1991) 2297–2301.

ical Signal Processing and Control 57 (2020) 101702 15

[34] J. Richman, J. Moorman, Physiological time-series analysis usingapproximate entropy and sample entropy, Am. J. Physiol. Heart Circ. Physiol.278 (6) (2000) H2039–H2049.

[35] C. Bandt, B. Pompe, Permutation entropy: a natural complexity measure fortime series, Phys. Rev. Lett. 88 (17) (2002), 174102.

[36] B. Fadlallah, B. Chen, A. Keil, J. Príncipe, Weighted-permutation entropy: acomplexity measure for time series incorporating amplitude information,Phys. Rev. E 87 (2) (2013), 022911.

[37] W. Chen, Z. Wang, H. Xie, W. Yu, Characterization of surface EMG signalbased on fuzzy entropy, IEEE Trans. Neural. Syst. Rehabil. Eng. 15 (2) (2007)266–272.

[38] P. Li, C. Liu, K. Li, D. Zheng, C. Liu, Y. Hou, Assessing the complexity ofshort-term heartbeat interval series by distribution entropy, Med. Biol. Eng.Comput. 53 (1) (2015) 77–87.

[39] S. Roberts, W. Penny, I. Rezek, Temporal and spatial complexity measuresfor electroencephalogram based brain-computer interfacing, Med. Biol. Eng.Comput. 37 (1) (1999) 93–98.

[40] H. Hurst, Long-term storage capacity of reservoirs, Trans. Am. Soc. Civ. Eng.116 (1951) 770–799.

[41] K. Falconer, Fractal Geometry: Mathematical Foundations and Applications,John Wiley & Sons, 2004.

[42] T. Higuchi, Approach to an irregular time series on the basis of the fractaltheory, Phys. D Nonlinear Phenom. 31 (2) (1988) 277–283.

[43] A. Jaiswal, H. Banka, Local pattern transformation based feature extractiontechniques for classification of epileptic EEG signals, Biomed. Signal Process.Control 34 (2017) 81–92.

[44] B. Hjorth, EEG analysis based on time domain properties,Electroencephalography Clin. Neurophysiol. 29 (3) (1970) 306–310.

[45] V. Bajaj, R.B. Pachori, Classification of seizure and nonseizure EEG signalsusing empirical mode decomposition, IEEE Trans. Inf. Technol. Biomed. 16(6) (2011) 1135–1142.

[46] N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N. Yen, C.C.Tung, H.H. Liu, The empirical mode decomposition and the hilbert spectrumfor nonlinear and non-stationary time series analysis, Proceedings of theRoyal Society of London. Series A: Mathematical, Physical and EngineeringSciences 454 (1971) (1998) 903–995.

[47] R. Bryce, K. Sprague, Revisiting detrended fluctuation analysis, Sci. Rep. 2(2012) 315–320.

[48] S. Chen, P. Shang, Y. Wu, Weighted multiscale Rényi permutation entropy ofnonlinear time series, Phys. A: Stat. Mech. Appl. 496 (2018) 548–570.

[49] A. Gupta, P. Singh, M. Karlekar, A novel signal modeling approach forclassification of seizure and seizure-free EEG signals, IEEE Trans. Neural Syst.Rehabil. Eng. 26 (5) (2018) 925–935.

[50] T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms,MIT Press, 2009.

[51] G. Manis, Fast computation of approximate entropy, Comput. MethodsPrograms Biomed. 91 (1) (2008) 48–54.

[52] G. Fele-Zorz, A faster algorithm for calculating the sample entropy,Proceedings of the Middle-European Conference on Applied TheoreticalComputer Science, Slovenia (2013).

[53] G. Manis, M. Aktaruzzaman, R. Sassi, Low computational cost for sampleentropy, Entropy 20 (1) (2018) 61–75.

[54] D. Powers, Evaluation: from precision, recall and F-measure to ROC,informedness, markedness and correlation, J. Mach. Learn. Techonol. 2(2011) 37–63.

[55] A. Temko, E. Thomas, W. Marnane, G. Lightbody, G. Boylan, Performanceassessment for EEG-based neonatal seizure detectors, Clin. Neurophysiol.122 (3) (2011) 474–482.

[56] L. Vidyaratne, K. Iftekharuddin, Real-time epileptic seizure detection usingEEG, IEEE Trans. Neural Syst. Rehabil. Eng. 25 (11) (2017) 2146–2156.

[57] A. Shoeb, J. Guttag, Application of machine learning to epileptic seizuredetection, Proceedings of the 27th International Conference on MachineLearning (2010) 975–982.

[58] C. Satirasethawong, A. Lek-Uthai, K. Chomtho, Amplitude-integrated EEGprocessing and its performance for automatic seizure detection, Proceedingsof the 2015 IEEE International Conference on Signal and Image ProcessingApplications, IEEE (2015) 551–556.

[59] A. Goldberger, L. Amaral, L. Glass, J. Hausdorff, P. Ivanov, R. Mark, J. Mietus,G. Moody, C. Peng, H. Stanley, PhysioBank, PhysioToolkit, and PhysioNet,Circulation 101 (23) (2000) e215–e220.

[60] R. Andrzejak, K. Lehnertz, F. Mormann, C. Rieke, P. David, C. Elger,Indications of nonlinear deterministic and finite-dimensional structures intime series of brain electrical activity: Dependence on recording region andbrain state, Phys. Rev. E 64 (6) (2001), 061907.

[61] S. Janjarasjitt, Epileptic seizure classifications of single-channel scalp EEGdata using wavelet-based features and SVM, Med. Biol. Eng. Comput. 55 (10)(2017) 1743–1761.

[62] Y. Yuan, G. Xun, K. Jia, A. Zhang, A multi-view deep learning method forepileptic seizure detection using short-time fourier transform, Proceedingsof the 8th ACM International Conference on Bioinformatics, ComputationalBiology and Health Informatics, ACM (2017) 213–222.

[63] S. Janjarasjitt, Performance of epileptic single-channel scalp EEGclassifications using single wavelet-based features, Australas. Phys. Eng. Sci.Med. 40 (1) (2017) 57–67.

Page 16: Biomedical Signal Processing and Control · 2020. 8. 27. · detection EEG Feature performance extraction classification a b s t r a c t Since the manual detection of electrographic

1 iomed

[99] Seizure Detection, https://www.persyst.com/technology/seizure-detection/.(Accessed 25 April 2019).

6 P. Boonyakitanont, A. Lek-uthai, K. Chomtho et al. / B

[64] P. Fergus, A. Hussain, D. Hignett, D. Al-Jumeily, K. Abdel-Aziz, H. Hamdan, Amachine learning system for automated whole-brain seizure detection,Appl. Comput. Inf. 12 (1) (2016) 70–89.

[65] U. Acharya, S. Oh, Y. Hagiwara, J. Tan, H. Adeli, Deep convolutional neuralnetwork for the automated detection and diagnosis of seizure using EEGsignals, Comput. Biol. Med. 100 (1) (2018) 270–278.

[66] R. San-Segundo, M. Gil-Martín, L.F. D’Haro-Enríquez, J.M. Pardo,Classification of epileptic EEG recordings using signal transforms andconvolutional neural networks, Comput. Biol. Med. 109 (2019) 148–158.

[67] N. Tawfik, S. Youssef, M. Kholief, A hybrid automated detection of epilepticseizures in EEG records, Comput. Electr. Eng. 53 (2016) 177–190.

[68] H. Ocak, Automatic detection of epileptic seizures in EEG using discretewavelet transform and approximate entropy, Expert Syst. Appl. 36 (2)(2009) 2027–2036.

[69] S. Altunay, Z. Telatar, O. Erogul, Epileptic EEG detection using the linearprediction error energy, Expert Syst. Appl. 37 (8) (2010) 5661–5665.

[70] P. Li, C. Karmakar, J. Yearwood, S. Venkatesh, M. Palaniswami, C. Liu,Detection of epileptic seizure based on entropy analysis of short-term EEG,PLoS One 13 (3) (2018), e0193691.

[71] J. Gotman, P. Gloor, Automatic recognition and quantification of interictalepileptic activity in the human scalp EEG, Electroencephalography Clin.Neurophysiol. 41 (5) (1976) 513–529.

[72] A. Dingle, R. Jones, G. Carroll, W. Fright, A multistage system to detectepileptiform activity in the EEG, IEEE Trans. Biomed. Eng. 40 (12) (1993)1260–1268.

[73] K. Polat, S. Günes , Classification of epileptiform EEG using a hybrid systembased on decision tree classifier and fast Fourier transform, Appl. Math.Comput. 187 (2) (2007) 1017–1026.

[74] O. Faust, U. Acharya, L. Min, B. Sputh, Automatic identification of epilepticand background EEG signals using frequency domain parameters, Int. J.Neural Syst. 20 (02) (2010) 159–176.

[75] G. Chen, Automatic EEG seizure detection using dual-tree complexwavelet-Fourier features, Expert Syst. Appl. 41 (5) (2014) 2391–2394.

[76] K. Samiee, P. Kovacs, M. Gabbouj, Epileptic seizure classification of EEGtime-series using rational discrete short-time Fourier transform, IEEE Trans.Biomed. Eng. 62 (2) (2015) 541–552.

[77] A. Subasi, M. Gursoy, EEG signal classification using PCA, ICA, LDA andsupport vector machines, Expert Syst. Appl. 37 (12) (2010) 8659–8666.

[78] D. Wang, D. Miao, C. Xie, Best basis-based wavelet packet entropy featureextraction and hierarchical EEG classification for epileptic detection, ExpertSyst. Appl. 38 (11) (2011) 14314–14320.

[79] A. Tzallas, M. Tsipouras, D. Fotiadis, Automatic seizure detection based ontime-frequency analysis and artificial neural networks, Comput. Intell.Neurosci. 2007 (2007).

[80] L. Guo, D. Rivero, A. Pazos, Epileptic seizure detection using multiwavelettransform based approximate entropy and artificial neural networks, J.Neurosci. Methods 193 (1) (2010) 156–163.

[81] V. Gupta, R.B. Pachori, Epileptic seizure identification using entropy of FBSEbased EEG rhythms, Biomed. Signal Process. Control 53 (2019) 101569.

[82] M. Li, W. Chen, T. Zhang, Automatic epileptic EEG detection usingDT-CWT-based non-linear features, Biomed. Signal Process. Control 34(2017) 114–125.

ical Signal Processing and Control 57 (2020) 101702

[83] M. Li, W. Chen, T. Zhang, Classification of epilepsy EEG signals usingDWT-based envelope analysis and neural network ensemble, Biomed. SignalProcess. Control 31 (2017) 357–365.

[84] M. Mursalin, Y. Zhang, Y. Chen, N. Chawla, Automated epileptic seizuredetection using improved correlation-based feature selection with randomforest classifier, Neurocomputing 241 (2017) 204–214.

[85] J. Gotman, D. Flanagan, J. Zhang, B. Rosenblatt, Automatic seizure detectionin the newborn: methods and initial evaluation, ElectroencephalographyClin. Neurophysiol. 103 (3) (1997) 356–362.

[86] A. Shoeb, H. Edwards, J. Connolly, B. Bourgeois, S. Treves, J. Guttag,Patient-specific seizure onset detection, Epilepsy Behav. 5 (4) (2004)483–498.

[87] A. Tzallas, M. Tsipouras, D. Fotiadis, Epileptic seizure detection in EEGs usingtime-frequency analysis, IEEE Trans. Inf. Technol. Biomed. 13 (5) (2009)703–710.

[88] G. Huang, Q. Zhu, C. Siew, Extreme learning machine: theory andapplications, Neurocomputing 70 (1-3) (2006) 489–501.

[89] R.R. Sharma, R.B. Pachori, Time-frequency representation using IEVDHM-HTwith application to classification of epileptic EEG signals, IET Science, Meas.Technol. 12 (1) (2017) 72–82.

[90] O. Salem, A. Naseem, A. Mehaoua, Epileptic seizure detection from EEGsignal using discrete wavelet transform and ant colony classifier,Proceedings of the 2014 IEEE International Conference on Communications,IEEE (2014) 3529–3534.

[91] B. Greene, S. Faul, W. Marnane, G. Lightbody, I. Korotchikova, G. Boylan, Acomparison of quantitative EEG features for neonatal seizure detection, Clin.Neurophysiol. 119 (6) (2008) 1248–1261.

[92] M. Bandarabadi, C. Teixeira, J. Rasekhi, A. Dourado, Epileptic seizureprediction using relative spectral power features, Clin. Neurophysiol. 126 (2)(2015) 237–248.

[93] L. Devroye, L. Györfi, G. Lugosi, A Probabilistic Theory of Pattern Recognition,vol. 31, Springer Science & Business Media, 2013.

[94] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edition,Academic Press Inc., 1990.

[95] E. Parzen, On estimation of a probability density function and mode, Ann.Math. Stat. 33 (3) (1962) 1065–1076.

[96] M. Hall, L. Smith, Feature subset selection: a correlation based filterapproach, Proceedings of the 4th International Conference on NeuralInformation Processing and Intelligent Information Systems, Springer(1997) 855–858.

[97] Z. Zhao, F. Morstatter, S. Sharma, S. Alelyani, A. Anand, H. Liu, Advancingfeature selection research, ASU Feature. Sel. Repository (2010) 1–28.

[98] A. Shoeb, Application of Machine Learning to Epileptic Seizure OnsetDetection and Treatment, Ph.D. Thesis, Massachusetts Institute ofTechnology, 2009.

[100] F. Pauri, F. Pierelli, G. Chatrian, W. Erdly, Long-term EEG-video-audiomonitoring: computer detection of focal EEG seizure patterns,Electroencephalography Clin. Neurophysiol. 82 (1) (1992) 1–9.