Recognising Cardiac Abnormalities in Wearable Device Photoplethysmography (PPG) with Deep Learning
Stewart Whiting, Samuel Moreland, Jason Costello, Glen Colopy, Christopher McCann
RRecognising Cardiac Abnormalities in Wearable DevicePhotoplethysmography (PPG) with Deep Learning
Stewart Whiting, Samuel Moreland, Jason Costello, Glen Colopy, Christopher McCann snap40, 24 Forth Street, Edinburgh, Scotland, UK.[stewart,sam,jason,glen,christopher]@snap40.com
ABSTRACT
Cardiac abnormalities affecting heart rate and rhythm are commonlyobserved in both healthy and acutely unwell people. Although manyof these are benign, they can sometimes indicate a serious healthrisk. ECG monitors are typically used to detect these events in elec-trical heart activity, however they are impractical for continuouslong-term use. In contrast, current-generation wearables with opticalphotoplethysmography (PPG) have gained popularity with their low-cost, lack of wires and tiny size. Many cardiac abnormalities suchas ectopic beats and AF can manifest as both obvious and subtleanomalies in a PPG waveform as they disrupt blood flow. We pro-pose an automatic method for recognising these anomalies in PPGsignal alone, without the need for ECG. We train an LSTM deepneural network on 400,000 clean PPG samples to learn typical PPGmorphology and rhythm, and flag PPG signal diverging from thisas cardiac abnormalities. We compare the cardiac abnormalities ourapproach recognises with the ectopic beats recorded by a bedsideECG monitor for 29 patients over 47.6 hours of gold standard ob-servations. Our proposed cardiac abnormality recognition approachrecognises 60%+ of ECG-detected PVCs in PPG signal, with a falsepositive rate of 23% – demonstrating the compelling power andvalue of this novel approach. Finally we examine how cardiac abnor-malities manifest in PPG signal for in- and out-of-hospital patientpopulations using a wearable device during standard care.
Many cardiac abnormalities affecting heart rate and rhythm are ob-served in both healthy and acutely unwell populations. These oftenpresent through arrhythmias, where the heart beat is either persis-tently or occasionally irregular, too fast or too slow. While there aremany different types of arrhythmias, among the most common aretachycardia (i.e., >120 beats/min), bradycardia (i.e., <45beats/min),atrial fibrillation (AF) and flutter (i.e., disordered and fluctuatingheart beats) and ectopic beats such as premature ventricular andatrial contractions (i.e., PVCs/PACs).Many arrhythmias are asymptomatic or benign, and can occurin otherwise healthy individuals seemingly randomly. However, atincreasing frequency for patients in high risk groups, they may bea precursor to, or part of an acute condition. AF has been shownto predispose a patient to stroke or heart failure. PVCs have beenshown to manifest with cardiomyopathy and myocardial infarction.An electrocardiogram (i.e.,
ECG , or
EKG ) measuring electricalactivity in the heart over time is the main diagnostic approach forinvestigating cardiac abnormalities. An ECG examination requiresa patient to have 2, 6 or 12 leads connected to electrodes correctlyattached to skin across their chest and limbs. Bedside and portabletelemetry Holter ECG monitors are the clinical gold standard for (a) (b)(c) (d)
Figure 1: 6-second PPG signal samples. (a) is a regular rhyth-mic PPG signal, while (b), (c) and (d) contain cardiac anomalies. accurate cardiac diagnosis. However, they are impractical or uncom-fortable for long-term continuous use in ambulatory patients, soECG investigations typically only take place over relatively short pe-riods of time for patients who have presented other symptoms whichwarrant the investigation. As a result, many rarer, asymptomatic orearly-onset cardiac conditions can be missed.In contrast, conveniently small, unobtrusive and inexpensive wear-able devices such as smart watches and fitness trackers which includea photoplethysmography (PPG) sensor to monitor the user’s pulsehave become extremely popular. PPG is an optical sensing techniquewhich transmits specific light wavelengths into well-perfused skintissue, and measures the amount of light reflected back - therebymeasuring the changing volume of blood in the tissue over timefollowing each pulse wave ejected from the heart. A typical realwearable device pulsatile PPG waveform is shown in Figure 1(a).Under normal conditions, the heart atria and ventricles contractsequentially to pump blood into the arterial system. Under abnormalconditions there can be a mistiming of contractions (e.g., a PVC),which can cause faster rhythm and reduced cardiac output. Similarly,reduced or no atrial contraction with erratic ventricular contraction(e.g., AF) can lead to random heart rate and cardiac output.As PPG only measures the output of the heart into the circula-tory system, it cannot fully characterise the underlying heart activitywhich preceded it as with the fidelity of an ECG. However, sincemany cardiac abnormalities affect the heart’s pulse wave output, theycan disrupt in various ways – albeit sometimes subtly – blood flowand thus ‘glitch’ subsequent PPG waveform rhythm and morphol-ogy. Real examples of brief cardiac anomalies disrupting the PPG a r X i v : . [ ee ss . SP ] J u l orkshop on Machine Learning for Medicine &Healthcare, KDD 2018, London, UK Stewart Whiting, Samuel Moreland, Jason Costello, Glen Colopy, Christopher McCann waveform morphology are shown in Figures 1(b), (c) and (d). Ac-cordingly, we posit that deeper analysis of PPG waveforms beyondjust pulse rate can provide clues of cardiac function from a wearabledevice PPG sensor worn comfortably all the time. This has the po-tential to dramatically increase the clinical value of data from thecurrent generation of PPG-based wearable devices. Firstly, cardiacabnormality clues identified in wearable PPG can be used to flagthe patient for a thorough ECG investigation. Secondly, monitoringthese types of events in broader populations may lead to new in-sights around their long-term and large-scale occurrence and impactin general populations. Finally, these clues may provide a salientsignal of cardiac function which can augment health deteriorationearly warning algorithms – allowing them to make earlier and moreaccurate predictions for many serious health conditions. Finding "surprising/interesting/unexpected/novel" sub-sequences intime series is generally referred to as anomaly detection [6]. Awide body of literature originating in machine learning and statisticshas developed generalisable and domain-specific anomaly detectionand classification techniques, typically based on learning commonpatterns and expected statistical distributions [2].Anomaly detection and classification approaches have long beenapplied to ECG and electroencephalography (EEG) signals. For ex-ample, [8] used deep learning to identify cardiac events in ECG.ECG data has very distinctive morphology (i.e., PQRST waveformcomplex), and many algorithms have been built into ECG monitorsto automatically classify certain cardiac conditions such as ST ele-vation with high precision. In contrast, PPG contains less time andfrequency domain information, and is more susceptible to calibrationand motion artefact noise, so requires a different approach. Specifi-cally for recognising PPG waveforms impacted by artefacts, SignalQuality Indices (SQI) identify good and bad PPG signals [3, 7]. SQIare heuristic-based models, based on the timing of pulse waves andknown physiological distributions. They classify larger fragmentsof signal as a binary good or bad, and are not designed to highlightspecific anomalous regions of PPG signal.
Artefacts in PPG can originate from many sources, including (i)physiological abnormalities (e.g., cardiovascular issues), (ii) motioncorruption, and (iii) poor PPG calibration (i.e., how much light totransmit into the skin - which is affected by skin colour, circulation,adipose tissue and sensor contact pressure). As we are interested inrecognising physiological artefacts - our approach must first ruleout that the source of any anomaly recognised is not from motionor calibration issues. Consequently, the snap40 upper-arm wearabledevice employs various continuous proprietary calibration routineswhich quickly recalibrate the PPG signal if it is compromised, e.g.,the patient moves the device for comfort to a location on their armwith different tissue properties. Moreover, the snap40 device discardsPPG signal data when wearer motion will irretrievably corrupt thePPG signal. Since cardiac abnormality analysis is very sensitive tosmall perturbations in the waveform caused by motion, we set themotion filtering level to ’very still’, in laying or sitting postures.
Input PPG signal Auto-encoded PPG signal
Figure 2: Input PPG and autoencoded counterpart examples.Note the autoencoder failing to reproduce anomalous regions.
Our proposed cardiac abnormality detection approach comprisestwo stages. Firstly, we train an autoencoder to learn what typicalnormal PPG morphology and rhythm looks like. To this end, weemploy a deep recurrent neural network LSTM autoencoder whichis capable of learning the time- and freqency-domain patterns foundin clean PPG signal. Secondly, we use this autoencoder to encodeand then reconstruct the input PPG signal based on its reduced di-mensionality representation. By measuring the differences betweenautoencoded and original PPG signals, we identify regions of thePPG signal that the constrained autoencoder representation fails tosufficiently reproduce – i.e., the specific anomalies. This unsuper-vised approach allows us to recognise anomalies without needing toexplicitly label an anomaly training set at scale.
Abstractly, an autoencoder learns, without supervision, an optimalreduced dimensionality representation of its training examples givenpermitted representational complexity (i.e., the number of neuronsand hidden layers). In this application, an autoencoder provides anunsupervised method of learning the common and defining morpho-logical and rhythm patterns of PPG signals. Accordingly, anythingthat is atypical in the input PPG signal is not encoded/decoded inthe subsequent autoencoder output.Two examples of PPG signal including cardiac abnormalities,and their autoencoded counterparts, are presented in Figure 2. Notehow the autoencoder adequately reproduces components which havetime and frequency typically seen in regular PPG signal, howeverfalters around unusual patterns – it is this faltering that allows to usto automatically recognise and flag specific abnormal PPG regions. ecognising Cardiac Abnormalities in Wearable DevicePhotoplethysmography (PPG) with Deep Learning Workshop on Machine Learning for Medicine &Healthcare, KDD 2018, London, UK (a) (b)(c) (d)
Figure 3: (a) and (b) show windowed Pearson’s r autoencoderreconstruction error from the PPG examples presented in Fig-ure 2. Flagged PPG anomalies are shown in (c) and (d). So the autoencoder does not learn to represent and reproducethe anomaly patterns we wish to expose, it is essential it is trainedusing only clean PPG signal samples without abnormalities. Accord-ingly, we constructed a dataset of approximately 400,000 8-secondclean and regular PPG signal samples from 300+ real patients, boot-strapped by semi-supervised FFT-based frequency analysis on arandom sample of around 2,000 PPG fragments. We do not explic-itly label any specific anomalies. As human physiology often followsmany heavy-tailed distributions, and a deep-learnt model requires alot of training examples to be effective, a training set of this scaleis necessary to reliably include clinically expected pulse rates of35-180 beats/min, and various common physiologies such as highand low, but not abnormal, heart rate variabilities.Digital signal processing (DSP) filtering was used to clean, down-sample and normalise PPG signal for the autoencoder. Since PPGis a time series, the autoencoder is a sequence-to-sequence model,implemented as a long-short term memory (LSTM) recurrent neuralnetwork in Tensorflow [1]. Preliminarily, we used a 2-layer (with80/40 neurons) LSTM. However, optimal neural network architec-ture and training is entirely dependent on the device sensor character-istics, DSP pipeline used, run-time constraints, training computationavailability, training data scale and desired sensitivity goals.
Anomalous PPG signal regions are identified when the original PPGsignal and its autoencoded counterpart diverge, as measured by adifference metric. Importantly for healthcare, this approach supportsexplainability as it explicitly flags specific irregular PPG signalregions for further automated analysis or manual human review.We found absolute error as a difference metric is overly sensitiveto occasional autoencoder underfitting for low frequency modulationin the PPG signal due to respiratory and parasympathetic inducedvariation [4]. Future work to increase training data, using longerPPG length and neural network tuning will likely alleviate this.Instead, computing Pearson’s r correlation co-efficient over slidingwindows of the input PPG signal and its autoencoded counterpart was sufficient to reliably recognise anomalous regions. Sensitivity ofthe anomaly detection is governed by both the size of these windowsand the r selected as an anomaly threshold. We selected half-secondwindows, with anomaly level r < . r . The respective Figures 3(c) and (d),show the identified anomalies, defined as periods where r falls be-low 0.6. Figure 3(c) shows multiple similar anomalies manifestingrapidly, with two dominant faster/slower frequencies manifestingin the PPG signal. Meanwhile, Figure 3(d) shows a single anomalywhich is prominent compared to its surrounding PPG context. To preliminarily evaluate and analyse the potential of our proposedapproach, we investigate clinical and wearable sensor data fromhundreds of clinical study patients with a range of pathologies andacuities being cared for in different settings in the UK and USA (e.g.,in hospital wards, during surgery and at home).Following recruitment, patients wore the snap40 device on theupper-arm for one hour to ten days while undergoing standard care.The snap40 device passively captured low-motion green PPG sensordata, and wirelessly transmitted high fidelity waveforms for post-hocanalysis. The clinical studies and their respective patient populationswere as follows:
HDU : (medical/surgical high-dependency unit),comprises 120 patients who are high acuity and continuously mon-itored, wearing the device during the day.
AMU : (acute medicalunit) consists 250 patients higher acuity patients who have beenrecently admitted to hospital, wearing the device for up to 10 days,24 hours a day.
SURGERY : consists 30 peri- and post-operativegeneral surgery patients, wearing the device for up to 2 days, 24hours a day. ED : consists of 250 emergency department patients ofvarying acuity, wearing the device between 30-240 minutes, dur-ing the day. HOME : consists of 8 heart bypass patients for 2 dayspost-discharge to home care, wearing the device 24 hours a day.In Section 4.1, we first examine the accuracy and sensitivity ofthe proposed approach for recognising known cardiac abnormalitiesin PPG signal, compared to abnormalities (i.e., PVCs) recognisedby a conventional bedside ECG monitor. Of course PVCs are onlyone type of cardiac abnormality, so this initial methodology providesan initial insight into performance using ECG-based gold standard(GS) PVC detection as a first proxy for real abnormal cardiac events.Since few patients receive continuous ECG monitoring, even in thehospital, this evaluation is based on the subset of HDU patients whohad leads correctly attached for continuous ECG monitoring forwhom we have data available. For these patients, we have a once-a-minute count of the zero or more PVCs detected in their ECG overthat minute. We align available wearable PPG sensor data over eachof those minutes, and compute set-based comparative evaluationmeasures (i.e., classification accuracy) based on the presence orabsence of anomalies detected in the aligned PPG, and respectiveGS PVCs. We filter GS PVCs to those where there is at least 30-seconds of high quality aligned wearable PPG signal available.Following this, in Section 4.2 to understand how cardiac ab-normalities manifest at large in PPG signal collected from diversepatient populations, we analyse the overall and per-patient frequencyof the cardiac abnormalities detected with our proposed approach. orkshop on Machine Learning for Medicine &Healthcare, KDD 2018, London, UK Stewart Whiting, Samuel Moreland, Jason Costello, Glen Colopy, Christopher McCann
Our test dataset contains 29 patients with a total of 2,852 zero ormore PVCs/min GS observations (i.e., 47.6 hours ECG monitoring,averaging 98 minutes of observations per patient), where each GSobservation has ≥
30 seconds of aligned good quality PPG signal.2,465 (86.4%) of the GS PVCs/min are 0 PVCs/min observations,387 (13.6%) are >= 1 PVCs/min observations while 195 (6.8%) are>= 2 PVCs/min observations. At the extreme, one patient has a GSobservation with 7 PVCs/min; showing PVCs are heavily skewed.In Table 1, we present the set-based detection accuracy confusionmatrix for recognising GS PVCs (when there is >= 1 PVC/min) inPPG signal using our proposed approach. Our proposed approachsuccessfully recognises around 60% of PVCs in the PPG signal alone,and incorrectly recognises 23% of PPG signals without a PVC ashaving a cardiac abnormality. This may be a genuine approach error,or it could be another type of cardiac abnormality. Additionally,because of our high-motion PPG filtering, we do not have completePPG signal coverage over the gold standard periods, so may misssome PVCs with this methodology limitation. Furthermore, the goldstandard itself will also have PVC classification error. [5] foundof 22,509 arrhythmia alarms analysed, 27.4% where false alarmswhich grew to 91.4% for acute life-threatening alarms, with noevents missed – indicating a preference for false positives rather thanfalse negatives, thus affecting our evaluation metrics.Allowing 1 or more PVCs/min in the GS means cardiac abnormal-ities are rare for the majority of GS observations. Accordingly, wecompute the confusion matrix when there are 2 or more PVCs/minpresent in the GS. This increases true positives to 132 (68%) - show-ing that when cardiac abnormalities are more prevalent, our PPG-based approach is increasingly more effective in recognising them.Overall these initial results are very encouraging as they demon-strate that even with a limited evaluation methodology, a basic modelcan achieve reasonable sensitivity while maintaining specificity. Fu-ture work will investigate more robust methods to identify anomaliesin autoencoder output, and if possible, classify the specific cardiacevents which caused them.
We use our proposed approach to identify cardiac abnormalitiesin over ten thousand hours of PPG data randomly sampled fromseveral large-scale clinical studies using the snap40 wearable device.Analysis results are presented in Table 2.Patient demographics and biases can explain many of the cardiacabnormality differences between populations. Expectedly, the oftenolder and higher acuity patients in HDU and AMU had the most PPGsamples with cardiac anomalies (i.e., 5.1% and 5.8%, respectively).In contrast, recently discharged patients at HOME had fewer PPGsamples with cardiac abnormalities overall (i.e., 3.52%) - and withlesser variability indicating more stable cardiac health compared toin-patients. Likewise, ED has a wide range of patient acuities (indeedmany will go to AMU), so while it has on average a lower PPG-basedcardiac abnormality occurrence (i.e., 3.1%), some patients have farmore as shown by the large variability (i.e., ± . Anomaly in PPG ✓ Anomaly in PPG ✗ PVC in ECG ✓ true positive ) 156(40.0% false negative )PVC in ECG ✗ false positive ) 1,891(76.7% true negative ) Table 1: Set-based detection accuracy confusion matrix forrecognising ECG-based GS PVCs (when there is >= 1 PVC/min)in aligned PPG signal using our proposed approach. % of PPG samples with anomalies,per patientPopulation: Avg (Stdev) MaxHDU 5.10% ( ± . ± . ± . ± . ± . Table 2: Frequency of cardiac abnormalities recognised in PPGsamples, aggregated per patient, in each patient population.
Cardiac abnormalities such as ectopic beats and AF manifest in awearable device PPG waveform as they disrupt blood flow. Someof these abnormalities are benign, while others can be a serioushealth risk factor. Accordingly, identifying cardiac abnormalities inPPG signals provided by a conveniently practical wearable device,as opposed to conventional inconvenient ECG monitors requiringmultiple electrodes and leads can be valuable.We demonstrated that cardiac abnormalities, where PPG signaldeviates from typical morphology and rhythm, can be recognised inwearable PPG signal with an unsupervised deep-learnt autoencoderanomaly detection approach. Preliminary evaluation on a large ECG-based gold standard dataset showed our approach recognises 60%+of ECG-detected PVCs in PPG signal, with a false positive rate of23%. Expectedly, analysis of several large clinical study datasetsshowed cardiac abnormalities detected are more frequent in higheracuity patients. Future work will enhance accuracy and sensitivity,and explore specific cardiac event classification.
REFERENCES [1] M. Abadi, et. al. Tensorflow: A system for large-scale machine learning. InOSDI’16, pages 265–283, Berkeley, CA, USA, 2016. USENIX Association.[2] V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey.
ACMComput. Surv. , 41(3):15:1–15:58, July 2009.[3] W. Karlen, K. Kobayashi, J. M. Ansermino, and G. A. Dumont. Photoplethysmo-gram signal quality estimation using repeated gaussian filters and cross-correlation.
Physiological Measurement , 33(10):1617, 2012.[4] W. Karlen, S. Raman, J. M. Ansermino, and G. A. Dumont. Multiparameterrespiratory rate estimation from the photoplethysmogram.
IEEE TBME , 60:1946–1953, 2013.[5] N. Kurka, T. Bobinger, B. Kallmünzer, J. Koehn, P. D. Schellinger, S. Schwab, andM. Köhrmann. Reliability and limitations of automated arrhythmia detection intelemetric monitoring after stroke.
Stroke , 46(2):560–563, 2015.[6] J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series,with implications for streaming algorithms. In
ACM SIGMOD 2003 , DMKD ’03,pages 2–11, New York, NY, USA, 2003. ACM.[7] C. Orphanidou, T. Bonnici, P. Charlton, D. Clifton, D. Vallance, and L. Tarassenko.Signal-quality indices for the electrocardiogram and photoplethysmogram: Deriva-tion and applications to wireless monitoring.
IEEE JBHI , 19(3):832–838, 2015.[8] M. A. Rahhal, Y. Bazi, H. AlHichri, N. Alajlan, F. Melgani, and R. Yager. Deeplearning approach for active classification of electrocardiogram signals.