Towards Memristive Deep Learning Systems for Real-time Mobile Epileptic Seizure Prediction
TTowards Memristive Deep Learning Systems forReal-time Mobile Epileptic Seizure Prediction
Corey Lammie , Wei Xiang , and Mostafa Rahimi Azghadi College of Science and Engineering, James Cook University, Queensland 4814, AustraliaEmail:{corey.lammie, mostafa.rahimiazghadi}@jcu.edu.au Department of Computer Science and Information Technology, La Trobe University, Victoria 3086, AustraliaEmail: [email protected]
Abstract —The unpredictability of seizures continues to distressmany people with drug-resistant epilepsy. On account of recenttechnological advances, considerable efforts have been madeusing different hardware technologies to realize smart devices forthe real-time detection and prediction of seizures. In this paper,we investigate the feasibility of using Memristive Deep LearningSystems (MDLSs) to perform real-time epileptic seizure predic-tion on the edge. Using the
MemTorch simulation framework andthe Children’s Hospital Boston (CHB)-Massachusetts Instituteof Technology (MIT) dataset we determine the performance ofvarious simulated MDLS configurations. An average sensitivity of77.4% and a Area Under the Receiver Operating CharacteristicCurve (AUROC) of 0.85 are reported for the optimal configura-tion that can process Electroencephalogram (EEG) spectrogramswith 7,680 samples in 1.408ms while consuming 0.0133W andoccupying an area of 0.1269mm in a 65nm ComplementaryMetal–Oxide–Semiconductor (CMOS) process. Index Terms —RRAM, Deep Learning, Seizure Prediction
I. I
NTRODUCTION T HE backbone of smart healthcare is the Internet of MedicalThings (IoMT), which is an amalgamation of medicaldevices and applications that connect through the internetto healthcare Information Technology (IT) [1] to overcomethe shortcomings of traditional healthcare. The IoMT has thepotential to give rise to many medical applications, includingmobile epileptic seizure prediction, which is the primary focusof this paper.IoMT edge devices can be used to perform computationslocally, reducing latency and alleviating privacy concerns whensensitive medical data is processed. Moreover, they can beused to realize closed-loop systems, which are highly desirablefor patient monitoring and treatments [2]. In Fig. 1, wedepict three different application scenarios of our proposedseizure prediction system. To enable such a smart DeepLearning (DL)-based system to operate in real-time at thepower-constrained edge, Resistive Random Access Memory(RRAM)-based in-memory DL computing architectures [3]could be used [2]. In this paper, we investigate the feasibility © 2021 IEEE. Personal use of this material is permitted. Permission fromIEEE must be obtained for all other uses, in any current or future media,including reprinting/republishing this material for advertising or promotionalpurposes, creating new collective works, for resale or redistribution to serversor lists, or reuse of any copyrighted component of this work in other works. Mobile Epilectic Seizure Prediction Device
Head-mounted EEG electrode arraySOP Alarm Remote MonitoringPreventative Treatment
Fig. 1: Application scenarios of the proposed system, whichis able to facilitate a variety of treatment types.of using MDLSs to perform real-time epileptic seizure pre-diction at the edge to enable a mobile solution. Our specificcontributions are as follows:1) We are the first to investigate an in-memory DL ap-proach to epileptic seizure prediction;2) We explore a variety of weight-representation schemeswhile accounting for some device nonidealities, andcompare the performance of our approach to other DLapproaches;3) We determine the power and area requirements for theoptimal configuration, and investigate its feasibility foreventual hardware realization.II. R
ELATED W ORK
To the best of our knowledge, all existing hardware imple-mentations tasked for epileptic seizure detection and predic-tion have been realized using Field Programmable Gate Ar-ray (FPGA), CMOS and Very-large-scale Integration (VLSI)technologies. Most existing hardware implementations detectepileptic seizures using traditional Machine Learning (ML) al-gorithms such as Linear Least Squares (LLS) [7], Support Vec-tor Machines (SVMs) [8], and k -nearest neighbors (kNN) [9].We refer the reader to [10] for a comprehensive surveyof epileptic seizure detection and prediction systems. While a r X i v : . [ c s . ET ] F e b aw EEG Signals Sampled EEG Signals SPH SOP t Spectograms Memristive DL System t t
S .. .. Nx S Nxt
Ictal EEG signal(a) (b) (d)(e) (f) A D C S T F T D A C (c) k - o f - n PreictalInterictal
Prediction g g g g M0 g g g g M1 g M2 g MN g g g g g g WL WL WL WL M BL BL BL BL Fig. 2: A simplified block diagram (a–e) of the proposed system and (f) a depiction of the methodology used to generatesynthetic preictal samples. Raw EEG signals (a) are measured using several electrodes, which are (b) sampled using Analogto Digital Converters (ADCs). CMOS circuits [4], [5] are used to filter and generate (c) spectrograms for each window, t ,using the discrete Short Time Fourier Transform (STFT). A (d) MDLS is used to perform in-memory computation to predictthe state of future samples that are to occur in the Seizure Occurrence Period (SOP) (preictal or interictal) during the SeizurePrediction Horizon (SPH). During training, (f) synthetic preictal samples are generated to balance the number of preictal andinterictal samples. Extra preictal samples are generated by sliding a 30 second window along the time axis at every step, S ,over preictal signals [6].Artificial Neural Networks (ANNs) have previously been usedfor epileptic seizure detection [11] and prediction [12] onFPGA, no previous work has investigated the use of memris-tors for the detection or prediction of epileptic seizures usingDL, which could drastically improve the performance on theIoMT edge. III. P RELIMINARIES
A. Seizure Forecasting Systems
There is emerging evidence [13] that the temporal dynamicsof brain activity of people with epilepsy can be classifiedinto 4 states: interictal (between seizures, or baseline), preictal(prior to seizure), ictal (seizure), and post-ictal (after seizures).Seizure forecasting or predictive systems aim to classify thepreictal brain state.
B. Memristive DL Systems
Memristive devices can be arranged within crossbar archi-tectures to perform Vector Matrix Multiplications (VMMs)in-memory, in O p q [14], which are used extensively in for-ward and backward propagations within Convolutional NeuralNetworks (CNNs) to compute the output of fully connectedand unrolled convolutional layers. Scaled weight matrices caneither be represented using two crossbars per layer, g pos and g neg , to represent positive and negative weights, respectively,or using a singular crossbar per layer with current mirrors, sothat the effective conductance of each device is offset by afixed value, g m , that can be determined using (1) [15] g m “ ´ {p ¯ R ON ` ¯ R OFF q , (1)where crossbar column currents can be multiplied by a layer-specific scaling parameter, K , to determine layer outputs.When a single device is used to represent each parameter,constant currents to mirror can easily be realized using a diode-connected NMOSFET by adjusting the NMOSFET channel width so that it has a passive conductance g m . Given scala-bility issues, large crossbars can be split into smaller ones,referred to as either modular crossbar arrays, or crossbartiles [16] to compute the output of linear and convolutionallayers with a large number of weights.IV. P ROPOSED S YSTEM
A simplified block diagram of the proposed system is pro-vided in Fig. 2. We confine the scope of this paper solely to thememristive DL system component depicted in Fig. 2(d), andonly consider instances where learning is performed offline.
A. Network Architecture
The network architecture used is summarized in Table I,where n is the number of electrodes that are used to sampleEEG signals, t is the window size in seconds, and p can bedetermined using (2) p “ tf s { k s “ t, (2)where k s denotes the number of overlapped samples, whichfor all cases in this paper is fixed to 128, i.e., half the samplingfrequency, f s . Batch normalization and the ReLU activationfunction is applied to the output of all convolutional layersand the first fully connected layer. The output of the last fullyconnected layer is fed through a Softmax activation function.In contrast to other architectures used in related works [6],[17], [18], our architecture uses only linear, 2d-convolutional,max pooling, and batch normalization layers. B. Training and Validation Datasets
For training and validation of our MDLS, we used theCHB-MIT [19] dataset, which consists of EEG recordingsfrom 22 pediatric subjects with intractable seizures. For ourpreliminary study reported in this paper, 5 random patientswere chosen. We leave evaluation using all subjects from the P R (/ h ) F P R (/ h ) F P R (/ h ) S e n s i t i v i t y ( % ) σ F P R (/ h ) F P R (/ h ) σ σ σ σσσσσσ S e n s i t i v i t y ( % ) S e n s i t i v i t y ( % ) S e n s i t i v i t y ( % ) S e n s i t i v i t y ( % ) Patient 01 Patient 02 Patient 05 Patient 19 Patient 23 ∞ Fig. 3: The average sensitivity and False Prediction Rate (FPR) across all 5 folds for simulated double column MDLSconfigurations.CHB-MIT and other datasets, such as the American SocietySeizure Prediction Challenge (ASSPC), to more exhaustivefuture works.
C. Preprocessing Steps
Within the CHB-MIT dataset, there are instances wheremultiple seizures occur in close proximity to each other. Forseizure prediction, we are interested in predicting leadingseizures. Consequently, seizures that occur ď T minutes aftera previous seizure are not considered, where T denotes theSOP. All time-series EEG signals are translated into time-frequency signals using STFTs with a window length of t seconds (Fig. 2(e-f)). Similarly to [6], power line noise wasremoved by excluding components in the frequency ranges of57–63 Hz and 117–123 Hz. The DC component (at 0 Hz) andcomponents of frequencies above 114 Hz were also removed. D. Training and Validation Methodologies
On account of the large class imbalance between preic-tal and interictal samples, we use an overlapped samplingTABLE I: Network architecture employed. For each con-volutional and pooling layer, f is the number of filters, k determines the filter size, and s denotes the stride length. Foreach fully connected layer N denotes the number of outputneurons. Input ( n ˆ p ˆ )Layer Output Shape Convolutional, f “ , k “ p , q , s “ p , q p ˆ r p ´ s{ ˆ q Max Pooling, k “ p , q p ˆ r p ´ s{ ˆ q Convolutional, f “ , k “ p , q , s “ p , q p ˆ r p ´ s{ ˆ q Max Pooling, k “ p , q p ˆ r p ´ s{ ˆ q Convolutional, f “ , k “ p , q , s “ p , q p ˆ r p ´ s{ ˆ q Max Pooling, k “ p , q p ˆ r p ´ s{ ˆ q Fully Connected, N “ p q Fully Connected, N “ p q technique, which was originally proposed in [6], to train theadopted network architecture. This is depicted in Fig. 2(f).Extra preictal samples are generated by sliding a t secondwindow along the time axis at every step, S , over preictalsamples, which is chosen so that there are a similar numberof samples per class (preictal or interictal). The Negative LogLikelihood Loss (NLL) function was used in conjunction withthe DiffGrad optimization algorithm, which has been shownto outperform other optimizers [20], to train the networks withan initial learning rate of e ´ and batch size of for 50epochs, when performance stagnated. For a correct prediction,a seizure onset must be after the SPH and within the SOP,as depicted in Fig. 2. The metrics used to test the proposedapproach are the accuracy, sensitivity, AUROC, and the FPR,as shown in Table II. For each subject, performance is reportedusing k “ stratified K-fold cross validation, where syntheticsamples are discarded during evaluation. All implementationsadopted the following parameters: T “ minutes, t “ seconds, and a SPH of 35 minutes.V. P ERFORMANCE E VALUATION
The MemTorch [15] simulation framework was used to sim-ulate RRAM devices during inference using the VTEAM [21]model. Performance metrics for our trained conventional andequivalent MDLS are reported in Table II. When predictingEEG seizures, it is common to have isolated false positivesduring interictal periods [6]. In recent works, discrete-timeKalman filters and least- k -prediction post-processing tech-niques have been adopted, however, they introduce a signifi-cant hardware overhead.In Fig. 3, we report the sensitivity and FPR for allsimulated configurations adopting a double column weight-representation scheme, as the performance of all configurationsadopting a single column weight-representation scheme isinsignificant. Consequently, we determine the optimal config-ABLE II: Patient information and performance metrics across all folds for our trained conventional CNNs and theirequivalent MDLSs adopting a double-column parameter-representation scheme. Patient Seizures Interictal Duration (h) S Accuracy (%) Sensitivity (%) AUROC FPR (/h)1 7 17.0 7.122 94.36 ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ ˘ uration to be a network adopting a double column weight-representation scheme. We attribute the high FPR for alltrained networks and simulated configurations to the omissionof any data post-processing, which is out of the scope of thispaper. For all devices, ¯ R ON “ and ¯ R OFF “ , [22].Two non-ideal device characteristics were modelled: device-to-device variability, and a finite number of discrete conductancestates. Device-device variability was introduced stochasticallyby sampling R ON and R OFF for each device from a normaldistribution with ¯ R ON “ and σ , and ¯ R OFF “ , and σ , as ¯ R OFF " ¯ R ON [15], for σ = 0–500. As it has been demon-strated that the spacing between states is not critical [23], wesimulated devices with between 2—10 uniformly distributedconductance states.From Fig. 3, it can be observed that for patients 1, 2, 5,and 23, the sensitivity and FPR decreased when the numberof conductance states decreased and device-device variabilityincreased. Interestingly, while the number of finite conduc-tance states did not have a large influence on the reportedsensitivity and FPR for these patients, device-device variabilitydid. The sensitivity has a relatively sudden transition period at σ ą , when the distributions of R ON and R OFF overlapped,causing the sensitivity to abruptly decrease. Conversely, theFPR was much more sensitive to device-device variability. Itis noted that, for patient 19, we report an average accuracy of96.33% and sensitivity of 54.72%. While this result cannot beclearly explained, it is not uncommon in literature, and otherDL works [6], [18] using the same dataset also report a highaccuracy and low sensitivity, near for the same patient.
A. Comparison to Other DL Models
Since previous related works [6], [18] do not use a consis-tent testing methodology, we can only roughly compare ourresults to them using the sensitivity and FPR metrics frompatients 1, 2, 5, 19, and 23 from Table II. Ref. [6] and [18]report total sensitivities of . and . , and FPRs of0.16/hr and 0.14/hr, respectively. In [18] clinical considerationswere discarded and a zero SPH was used. Consequently, theTABLE III: Power, area, and latency requirements of theoptimal configuration using 128 ˆ
128 crossbar tiles for TDMand parallelized implementations (Imp.).
Imp. Power (W) Area (mm ) Latency (ms) Energy (mJ) TDM 0.0133 0.1269 1.408 0.0187Parallelized 1.7 8.5089 0.011 0.0187 reported performance is likely inflated. Nevertheless, as wedid not perform any data post-processing, compared to bothworks, all of our networks have significantly larger FPRs. InTable II, we report an average sensitivity of . , whichis lower than that reported in [6] and [18]. Our result isstill significant, because we use 2d-convolutional layers, maxpooling, and fully-connected layers, and perform minimal dataprocessing, while [6] used 3d-convolutional layers and [18]performed hyper-parameter optimization to obtain the lowestaverage validation loss over a 10 fold cross-validation. B. Power, Area, and Delay Analysis
To determine the power and area requirements as well asthe latency, which dictates the inference time of the optimalconfiguration, we map each layer of our deep network to mod-ular 128 ˆ
128 crossbar tiles with no shared weights betweenlayers using parameters for 65nm technology from [24]. Thearea and power of each ADC (8-bit) is, therefore, calculated tobe 3 ˆ ´ mm and 2 ˆ ´ W, and the area of each RRAMcell is estimated to be 1.69 ˆ ´ mm . During inference, weassume constant operation at V “ . V per active cell, thelargest voltage used to encode inputs, and an average cellresistance of p ¯ R OFF ` ¯ R ON q{ . All ADCs are assumed tooperate at 5 MHz, and the number of tiles used for eachnetwork is assumed to be the exact number required to balancethe latency among layers. RRAM read latency is considerednegligible compared to ADC readout.Table III shows the power, area, latency, and energy ofour optimal configuration for configurations where samplesare continuously fed to the network from a First-In First-Out(FIFO) buffer. We compare requirements for implementationsfor which each tile contains one ADC, and Time-DivisionMultiplexing (TDM) is used to read out column currents(denoted TDM ), and for which each tile contains one ADCper column to read out column currents in parallel (denoted
Parallelized ). Given the large window length used, further du-plication of crossbar tiles to improve throughput was deemedunnecessary. VI. C
ONCLUSION
We investigated the potential of memristors to contributeto the design of a DL-based seizure prediction device. Ourfindings demonstrate that MDLS holds great promise fordeveloping a compact epileptic seizure prediction architecturecapable of low-power and real-time mobile operation. Ouroptimal configuration exhibits comparable performance to ex-isting DL works in the literature while consuming significantlyess power than current Mobile Graphics Processor Units(mGPUs) and edge processors [2]. In future, the longevity andreliability of such a system should be properly investigated.R
EFERENCES[1] D. V. Dimitrov, “Medical Internet of Things and Big Data in Healthcare,”
Healthcare Informatics Research , vol. 22, pp. 156–163, Jul. 2016.[2] M. Rahimiazghadi, C. Lammie, J. K. Eshraghian, M. Payvand, E. Do-nati, B. Linares-Barranco, and G. Indiveri, “Hardware Implementationof Deep Network Accelerators Towards Healthcare and BiomedicalApplications,”
IEEE Transactions on Biomedical Circuits and Systems ,vol. 14, no. 6, pp. 1138 – 1159, Dec. 2020.[3] M. Rahimi Azghadi, Y.-C. Chen, J. K. Eshraghian, J. Chen, C.-Y. Lin,A. Amirsoleimani, A. Mehonic, A. J. Kenyon, B. Fowler, J. C. Lee et al. ,“Complementary metal-oxide semiconductor and memristive hardwarefor neuromorphic computing,”
Advanced Intelligent Systems , vol. 2,no. 5, p. 1900189, 2020.[4] T. Tsai, J. Hong, L. Wang, and S. Lee, “Low-Power Analog IntegratedCircuits for Wireless ECG Acquisition Systems,”
IEEE Transactions onInformation Technology in Biomedicine , vol. 16, no. 5, pp. 907–917,Sep. 2012.[5] H. K. Lin, P. H. Lin, and C. W. Liu, “Design of a High-Throughputand Area-Efficient Ultra-Long FFT Processor,” in
Proceedings of theInternational Symposium on VLSI Design, Automation and Test (VLSI-DAT) , Hsinchu, Taiwan., Aug. 2020.[6] N. D. Truong, A. D. Nguyen, L. Kuhlmann, M. R. Bonyadi, J. Yang,S. Ippolito, and O. Kavehei, “Convolutional Neural Networks forSeizure Prediction using Intracranial and Scalp Electroencephalogram,”
Neural Networks , vol. 105, pp. 104–111, Sep. 2018.[7] T. Chen, C. Jeng, S. Chang, H. Chiueh, S. Liang, Y. Hsu, and T. Chien,“A Hardware Implementation of Real-time Epileptic Seizure Detectoron FPGA,” in
Proceedings of the IEEE Biomedical Circuits and SystemsConference (BioCAS) , La Jolla, CA., Nov. 2011.[8] H. Wang, W. Shi, and C. Choy, “Hardware Design of Real TimeEpileptic Seizure Detection Based on STFT and SVM,”
IEEE Access ,vol. 6, pp. 67 277–67 290, Sep. 2018.[9] A. Page, C. Sagedy, E. Smith, N. Attaran, T. Oates, and T. Mohsenin, “AFlexible Multichannel EEG Feature Extractor and Classifier for SeizureDetection,”
IEEE Transactions on Circuits and Systems II: ExpressBriefs , vol. 62, no. 2, pp. 109–113, Dec. 2015.[10] T. N. Alotaiby, S. A. Alshebeili, T. Alshawi, I. Ahmad, and F. E. Abd El-Samie, “EEG Seizure Detection and Prediction Algorithms: A Survey,”
EURASIP Journal on Advances in Signal Processing , vol. 1, no. 183,Dec. 2014.[11] M. U. Saleheen, H. Alemzadeh, A. M. Cheriyan, Z. Kalbarczyk, andR. K. Iyer, “An Efficient Embedded Hardware for High AccuracyDetection of Epileptic Seizures,” in
Proceedings of the InternationalConference on Biomedical Engineering and Informatics (BMEI) , Yantai,China., Oct. 2010.[12] H. Daoud, P. Williams, and M. Bayoumi, “IoT based Efficient EpilepticSeizure Prediction System Using Deep Learning,” in
Proceedings of theIEEE World Forum on Internet of Things (WF-IoT) , Aug. 2020.[13] S. J. M. Smith, “EEG in the Diagnosis, Classification, and Managementof Patients with Epilepsy,”
Journal of Neurology, Neurosurgery &Psychiatry , vol. 1, no. 76, pp. ii2–ii7, Jun. 2005.[14] M. Hu, C. E. Graves, C. Li, Y. Li, N. Ge, E. Montgomery, N. Davila,H. Jiang, R. S. Williams, J. J. Yang, Q. Xia, and J. P. Strachan,“Memristor-Based Analog Computation and Neural Network Classifi-cation with a Dot Product Engine,”
Advanced Materials , vol. 30, no. 9,p. 1705914, Jan. 2018.[15] C. Lammie, W. Xiang, B. Linares-Barranco, and M. R. Azghadi,“MemTorch: An Open-source Simulation Framework for MemristiveDeep Learning Systems,”
ArXiv , vol. abs/2004.10971, Apr. 2020.[16] D. J. Mountain, M. R. McLean, and C. D. Krieger, “Memristor CrossbarTiles in a Flexible, General Purpose Neural Processor,”
IEEE Journalon Emerging and Selected Topics in Circuits and Systems , vol. 8, no. 1,pp. 137–145, Oct. 2018.[17] I. Kiral-Kornek, S. Roy, E. Nurse, B. Mashford, P. Karoly, T. Carroll,D. Payne, S. Saha, S. Baldassano, T. O’Brien, D. Grayden, M. Cook,D. Freestone, and S. Harrer, “Epileptic Seizure Prediction Using BigData and Deep Learning: Toward a Mobile System,”
EBioMedicine ,vol. 27, pp. 103–111, Jan. 2018. [18] H. Khan, L. Marcuse, M. Fields, K. Swann, and B. Yener, “Focal OnsetSeizure Prediction Using Convolutional Networks,”
IEEE Transactionson Biomedical Engineering , vol. 65, no. 9, pp. 2109–2118, Sep. 2018.[19] A. H. Shoeb and J. V. Guttag, “Application of Machine Learningto Epileptic Seizure Detection,” in
Proceedings of the InternationalConference on Machine Learning (ICML) , Haifa, Israel., Jun. 2010.[20] S. R. Dubey, S. Chakraborty, S. K. Roy, S. Mukherjee, S. K. Singh, andB. B. Chaudhuri, “diffGrad: An Optimization Method for ConvolutionalNeural Networks,”
IEEE Transactions on Neural Networks and LearningSystems , vol. 31, no. 11, pp. 4500–4511, Nov. 2019.[21] S. Kvatinsky, M. Ramadan, E. G. Friedman, and A. Kolodny, “VTEAM:A General Model for Voltage-Controlled Memristors,”
IEEE Transac-tions on Circuits and Systems II: Express Briefs , vol. 62, no. 8, pp.786–790, Aug. 2015.[22] E. Yalon, A. Gavrilov, S. Cohen, D. Mistele, B. Meyler, J. Salz-man, and D. Ritter, “Resistive Switching in HfO Probed by aMetal–Insulator–Semiconductor Bipolar Transistor,”
IEEE Electron De-vice Letters , vol. 33, no. 1, pp. 11–13, Nov. 2012.[23] A. Mehonic, D. Joksas, W. H. Ng, M. Buckwell, and A. J. Kenyon,“Simulation of Inference Accuracy Using Realistic RRAM Devices,”
Frontiers in Neuroscience , vol. 13, no. 593, Jun. 2019.[24] Q. Wang, X. Wang, S. H. Lee, F. Meng, and W. D. Lu, “A DeepNeural Network Accelerator Based on Tiled RRAM Architecture,”in