[PDF] Detection of Maternal and Fetal Stress from the Electrocardiogram with Self-Supervised Representation Learning

Abstract

In the pregnant mother and her fetus, chronic prenatal stress results in entrainment of the fetal heartbeat by the maternal heartbeat, quantified by the fetal stress index (FSI). Deep learning (DL) is capable of pattern detection in complex medical data with high accuracy in noisy real-life environments, but little is known about DL's utility in non-invasive biometric monitoring during pregnancy. A recently established self-supervised learning (SSL) approach to DL provides emotional recognition from electrocardiogram (ECG). We hypothesized that SSL will identify chronically stressed mother-fetus dyads from the raw maternal abdominal electrocardiograms (aECG), containing fetal and maternal ECG. Chronically stressed mothers and controls matched at enrolment at 32 weeks of gestation were studied. We validated the chronic stress exposure by psychological inventory, maternal hair cortisol and FSI. We tested two variants of SSL architecture, one trained on the generic ECG features for emotional recognition obtained from public datasets and another transfer-learned on a subset of our data. Our DL models accurately detect the chronic stress exposure group (AUROC=0.982+/-0.002), the individual psychological stress score (R2=0.943+/-0.009) and FSI at 34 weeks of gestation (R2=0.946+/-0.013), as well as the maternal hair cortisol at birth reflecting chronic stress exposure (0.931+/-0.006). The best performance was achieved with the DL model trained on the public dataset and using maternal ECG alone. The present DL approach provides a novel source of physiological insights into complex multi-modal relationships between different regulatory systems exposed to chronic stress. The final DL model can be deployed in low-cost regular ECG biosensors as a simple, ubiquitous early stress detection and monitoring tool during pregnancy. This discovery should enable early behavioral interventions.

Full PDF

DDetection of Maternal and Fetal Stress from theElectrocardiogram with Self-Supervised RepresentationLearning

Pritam Sarkar † , Silvia Lobmaier † , Bibiana Fabre , Diego Gonz´alez , AlexanderMueller , Martin G. Frasch , Marta C. Antonelli Ali Etemad † Co-first authors*Co-corresponding authors

Abstract

Maternal chronic stress during pregnancy programs the fetal brain for altered devel-opmental trajectories. We showed that in stressed mother-fetus dyads, this results inmeasurable synchronization of the fetal heartbeat by the maternal heartbeat, quantifiedby the fetal stress index (FSI) [1]. Can this biophysical phenomenon be scaled to aneasily deployable biomarker of chronic stress in pregnant mothers to help guide earlyinterventions which can reverse altered fetal developmental trajectories?Deep learning (DL)-based approaches [2] to pattern detection in complex physiologicaldata have shown high accuracy in noisy real-life environments [3, 4]. Nonetheless, littleis known about their utility in the setting of non-invasive biometrics obtained duringhuman pregnancy.Here, we hypothesized that a DL approach to pattern recognition in maternalabdominal electrocardiograms (aECG) obtained in chronically stressed mothers andcontrols matched at enrolment at 32 weeks of gestation will detect chronic stress inmother-fetus dyads, i.e., a DL classification model (Figure 1).We validated the exposure to stress by psychological inventory, molecular andbiophysical biomarkers including maternal hair cortisol and FSI, respectively. Then, wetested the correlation between these exposure measures and the aECG and maternalECG (mECG) features captured by the DL pipeline, i.e., DL regression model. Weimplemented the DL pipeline using the recently established self-supervised learning(SSL) approach that provides emotional recognition from ECG [5, 6].We tested two variants of SSL architecture, one trained on the generic ECG featuresfor emotion recognition obtained from public datasets and another transfer-learned on asubset of the composite aECG (which includes fetal ECG, fECG) or mECG data. Ourstudies of the model’s performance in regression tasks and with or without the inclusionof the fetal ECG signal reveal a rich structure correlating to psychological, molecular,and biophysical biomarkers of maternal and fetal stress exposure at 34 weeks of gestationand at birth.

There were no differences in age between the cohorts of our and the public datasetsand the total number of subjects; in the public dataset used for training there were 103subjects compared to 107 in FELICITy dataset (Table 1).A clear difference existed in the gender composition, albeit its impact on the modelperformance remains uncertain. ECG duration was more variable in the public datasetthan in the FELICITy dataset and so was the sampling rate. However, it remains alsouncertain whether this had any impact on the model performance, especially since allECG was resampled at 256 Hz for the DL pipeline. It is possible that such variance indata quality and the composition of the participants made the model more robust, but Table 1.

Demographic and dataset characteristics.Dataset AMIGOS DREAMER WESAD SWELL FELICITyNo. of Participants 40 23 15 25 107Female/Male 13/27 9/14 3/12 8/17 107/0Age 28.3 (21-40) 26.6 ± ± ± ± Hz ) 256 256 700 2048 900 2/15 SS-10 (n = 728) aECG separation into fetal and maternal ECG Computation of the FSI Recording of abdominal ECG (aECG) Deep Learning Prediction: stress classification; regression

Control (n = 577) Stressed (n = 151) aECG (n = 59) aECG (n = 48) Pregnant woman at 32 weeks (n = 2000) mECG fECG

Maternal hair cortisol (n = 48) Maternal hair cortisol (n = 59)

FSI Stressed 1:1 matched control on enrollment PSS-10 questionnaire Recruitment 34 weeks Pre-delivery: chronic stress P SS - ≥ P SS - < Figure 1.

Summary of the approach: Prenatal Distress Questionnaire (PDQ) andPrenatal Stress Score (PSS-10) were determined in 32 weeks pregnant women classifyingthem as stressed group or matched controls. At 34 weeks, abdominal ECG (aECG) wasrecorded and prior to delivery, maternal hair was sampled for cortisol measurementsreflecting chronic stress exposure over the past two months. The aECG was deconvolutedinto fetal and maternal ECG (fECG, mECG) from which Fetal Stress Index (FSI) wascomputed, reflecting joint maternal and fetal chronic stress exposure. Deep Learningusing a self-supervised learning framework ensued on aECG and mECG (fECG did notqualify due to signal quality) to detect stress group status (i.e., classification) and valuesof cortisol, FSI, PDQ, and PSS-10 (i.e., regression).this conjecture would need to be tested in future work.We compared the model performance for detecting stressed mother-fetus dyads aswell as predicting maternal hair cortisol, FSI, PDQ, and PSS values depending on twofactors: the source of ECG (aECG, mECG) and the source of the trained model (learningfrom the FELICITy dataset - the first SSL approach, or transfer-learning from the publicdatasets - the second SSL approach) (Tables 2, 3).

Within the FELICITy dataset, the ECG source made no difference, but using thepublic dataset improved the F1 score, sensitivity, specificity and AUROC regardless ofthe ECG source (Table 2). In comparison to the FELICITy dataset, training on thepublic dataset while using aECG improved performance across all metrics except theaccuracy. Accuracy was excellent overall and stood out as not being influenced by theECG source or the origin of the trained model. Again in comparison to the FELICITydataset, training on the public dataset while using mECG also improved performanceoverall, except accuracy, PPV, and NPV. This was because mECG in general boostedthe performance regardless of how the model was trained - on the FELICITy or thepublic datasets. The best group classification performance overall, across all metrics,was achieved using the public dataset and mECG. 3/15 able 2.

Detection of stressed mothers by self-supervised learning trained on theFELICITy and public datasets.

FELICITy dataset

Source Accuracy F1 Score Sensitivity Specificity PPV NPV AUROCaECG 0 . ± .

023 0 . ± .

022 0 . ± .

031 0 . ± .

045 0 . ± .

039 0 . ± .

020 0 . ± . . ± .

093 0 . ± .

101 0 . ± .

087 0 . ± .

102 0 . ± .

086 0 . ± . Public datasets

Source Accuracy F1 Score Sensitivity Specificity PPV NPV AUROCaECG 0 . ± .

002 0 . ± . ∗ . ± . ∗ . ± . ∗ . ± . ∗ . ± . ∗ . ± . ∗ mECG 0 . ± .

003 0 . ± . ∗ . ± . ∗ . ± . ∗ . ± . . ± . . ± . ∗ ∗ Public versus FELICITy dataset, Mann Whitney U test. mECG versus aECG within the same dataset, Mann Whitney U test.

Statistical significance at p < .

025 accounting for two comparisons (using Bonferroni-Holm correction).

Recognizing the spread of PSS-10 scores, in the present study we also assessed theregression relationship between the scores and emotional recognition performance inour DL model (Table 3). The model performance results were similar for the regressionanalyses. We see overall similar improvements and best performance for all biomarkerswhen using mECG and the public dataset. When using the model trained on theFELICITy dataset, there was no difference in prediction for all biomarkers when usingaECG or mECG. This suggests there is enough information in the mECG and the modeltrained on the FELICITy dataset. In contrast, using the model trained on the publicdataset improved the performance regardless of the source of data, aECG or mECG.For aECG on the FELICITy dataset, the model performed poorly for all biomarkers.Using mECG instead brought no significant improvement. When training on the publicdataset, the performance improved on both aECG and mECG for cortisol, FSI, andPDQ, but not for PSS when using mECG, because it is already quite accurate whentrained on the FELICITy dataset. In other words, the prediction of the PSS scoresachieves highest performance when using the SSL pipeline trained on the FELICITydataset and using mECG rather than the composite aECG, i.e., a signal containingmaternal and fetal ECG combined.For FSI and PDQ, it appears that the effect of the regression improvement by usingthe public dataset is dependent not on the biomarker, but on the data source, i.e.,aECG versus mECG. This may be explained by the richer intrinsic structure of aECGcompared to the uniquely maternal sourced mECG, which is better captured by thepublic dataset. The public dataset was also richer than the FELICITy dataset withregard to participants’ gender composition, ECG sampling rate and duration (Table 1).Overall, using the raw aECG decreases the model performance on both classificationand regression. Identification of the effects of chronic stress and a highly accurateprediction of its effects on cortisol, FSI, PDQ, and PSS is possible from maternal ECGalone using the SSL model trained on the public dataset and using FELICITy datasetdoes not improve this performance neither for classification nor for regression. This isvisualized in Figure 2. 4/15 able 3.

Prediction of biomarkers by self-supervised learning on the FELICITy andpublic datasets.Task Source R2 FELICITy dataset R2 Public datasetsCortisol aECG 0 . ± .

053 0 . ± . ∗ mECG 0 . ± .

322 0 . ± . ∗ FSI aECG 0 . ± .

052 0 . ± . ∗ mECG 0 . ± .

274 0 . ± . ∗ PDQ aECG 0 . ± .

062 0 . ± . ∗ mECG 0 . ± .

302 0 . ± . ∗ PSS aECG 0 . ± .

072 0 . ± . ∗ mECG 0 . ± .

294 0 . ± . ∗ Public versus FELICITy dataset, Mann Whitney U test mECG versus aECG within the same dataset, Mann Whitney Utest

Statistical significance at p < .

025 accounting for two comparisons (using Bonferroni-Holm correction). aECG FELICITy dataset (0.794±0.022)mECG FELICITy dataset (0.931±0.094)aECG Public dataset (0.936±0.002)mECG Public dataset (0.982±0.002)chance level

False Positive Rate T r u e P o s i t i v e R a t e Figure 2.

AUROC of SSL models trained on the public and FELICITy datasets toidentify stressed and non-stressed mother-fetus dyads from aECG or mECG. MeanAUROC values are marked as solid lines and standard deviations across 5-folds aremarked as shaded regions.

Chronic stress is one of the most common modifiers of fetal and postnatal developmentwith lifelong lasting effects on health [7, 8]. First, confirming our hypothesis, we report ascalable and readily deployable approach using an SSL model of DL to identify chronicallystressed mother-fetus dyads and predict their biochemical, biophysical, and psychologicalcharacteristics from a regular mECG with a high degree of accuracy. The excellentperformance of the model trained on the public dataset suggests a high probability ofgeneralizability of our findings to new data.This is an important advance in early and non-invasive detection of chronic stresseffects during pregnancy. The demonstration of mECG being sufficient translates intothe ability of using conventional ECG devices which are widely available already. Thiswill enable wider utilization of ECG for studies of chronic stress effects on maternal,fetal, and postnatal health.Another novel insight stems from two related observations. First, there was a highdegree of accuracy in predicting individual characteristics of the mother-fetus dyadsrelated to chronic stress (cortisol, FSI, PDQ, and PSS). Second, an exploration of the5/15eural network’s latent space features suggests strongly that the entire ECG waveformstructure is required and not only the temporal features of R-R peaks, i.e., heart ratevariability (data not shown).The deep neural network properties are important to consider for two reasons.First, there appears to be a rich intrinsic integrated information about these distinctphysiological properties contained in ECG. This information is retained after the temporalorder is destroyed by permutation of ECG waveforms as done in this work. To ourknowledge, this is the first demonstration of such a relationship. Second, most presentlyavailable wearables do not record continuous ECG, but, rather, use photoplethysmography(PPG) sensors to track heart rate triggered from the pulse waveform. A new DL approachsuggests that higher quality ECG signal can be derived from PPG using a generativeadversarial network (GAN) architecture [9]. Further research is needed to validatewhether this may pave the way to using the present day wearables for identifyingstress. Meanwhile, a next generation of wearables is capable of continuous on-bodyECG monitoring [10], while some readily available clinical-grade ECG trackers can bedeployed for this purpose already [11].Our study has limitations. First, the datasets are relatively small with less than 200subjects in both the public and FELICITy datasets, yet the model performance hasshown satisfactory stability. Also, these datasets are the largest known to us so far topermit such investigation. Second, we abstained from complicating our model with theaddition of ancillary features such as BMI. It has been suggested that a bias may beadded with such an approach that results from introducing unintended confounders inthe causal inference sense, e.g., BMI may interact with ECG features or other featuresthat interact with ECG in ways we don’t know and the results may be biased or evenmeaningless as a result.In conclusion, maternal-fetal early-life stress and its molecular and biophysicalcharacteristics can be predicted with very good accuracy and reproducibility fromregular ECG using a scalable SSL deep learning approach.

The complete experimental design can be found in [1]. Ethics approval was obtainedfrom the Committee of Ethical Principles for Medical Research at the TUM (registrationnumber 151/16S; ClinicalTrials.gov registration number NCT03389178). Briefly, in thisprospective study, stressed mothers were matched with controls 1:1 for parity, maternalage, and gestational age at study entry. Recruited subjects were between 18 and 45years of age, and were in their third trimester. The study ran for 22 months fromJuly 2016 until May 2018, and subjects were selected from a cohort of pregnant womenfollowed in the Department of Obstetrics and Gynecology at “Klinikum rechts der Isar”of the Technical University of Munich (TUM). This is a tertiary center of Perinatologylocated in Munich, Germany, which serves 2000 mothers/newborns per year. Figure3 presents the recruitment flowchart for this dataset and the use of data in this study.Four exclusion criteria were applied, namely (a) serious placental alterations definedas fetal growth restriction according to Gordijn et al. [12]; (b) fetal malformations; (c)maternal severe illness during pregnancy; (d) maternal drug or alcohol abuse.The Cohen Perceived Stress Scale questionnaire was administered to gauge chronicnon-specific stress exposure (PSS-10) [13]. PSS-10 ≥

19 categorized subjects as stressed,as established [1]. We applied inclusion- and exclusion criteria following returning thequestionnaires. When a subject was categorized as stressed, the next screened participantmatching for gestational age at recording with a PSS-10 score <

19 was entered into the6/15

000 pa � ents received the ques � onnaire728 pa � ents screened via PSS – 10returned ques � onnaire151 pa � ents: PSS-score >=19 577 pa � ents: PSS-score <1917 with exclusion criteria:6 mul � plepregancies3 fetal malforma � on5 SGA/ IUGR3 other83 pa � ents eligible134 pa � ents eligible 54 pa � ents not included:23refused par � cipa � on7 not possible to approach24 organisa � onal problems80 pa � ents included into the SG 1 withdrawn consent 1 withdrawn consent79 pa � ents included for analysis 85 pa � ents included for analysis86 pa � ents included into CG Pa � ents with exclusion-criteria excludedPa � ents were matched 1:1 forparity,maternal age andgesta � onal age at study entry 59 par � cipants a � er excluding subjects withmissing labels48 par � cipants a � er excluding subjects withmissing labels Figure 3.

Recruitment flow chart for the FELICITy dataset: from screening to deeplearning.study as control. In addition to PSS-10, the participants received the German Versionof the “Prenatal Distress Questionnaire” (PDQ) containing 12 questions on pregnancyrelated fears and worries regarding pregnancy related changes of the body weight andtroubles, child’s health, delivery and pregnancy’s impact on the women’s relationship.A transabdominal ECG (aECG) recording with a sampling rate of 900 Hz and aduration of at least 40 minutes was performed two and a half weeks after screening.The AN24 (GE HC/Monica Health Care, Nottingham, UK) was used. We calculatedthe signal quality index (SQI) [14] for aECG, in one-second windows, and subsequentlydiscarded segments with an SQI of lower than 0.5. Using the fetal and maternal ECGdeconvolution algorithm SAVER [14], we extracted fetal ECG (fECG) and maternalECG (mECG).We utilized SQI to discard the noisy data resulting in the averaged duration of mECGand aECG of 46.07 ± ± .2 Bivariate Phase Rectified Signal Averaging To analyze the relationship between two signals recorded synchronously, mHR and fHR,we use the bivariate phase rectified signal averaging (BPRSA) method [18]. This methodextends the “monovariate” PRSA method proposed for the analysis of fHR [19, 20].The two signals in question in this study are the mHR (trigger signal) and the fHR(target signal). The BPRSA algorithm operates by first detecting a number of anchorpoints A , defined as decreases in mHR. Next, for the detected set of A , we interpolatethe fHR with a sampling rate of 900 Hz to match the maternal ECG. We then detectthe time of the anchor points in fHR, which we denote by A (cid:48) . Then, around each anchorpoint A (cid:48) in fHR, a window of length (2 L ) is selected. In this paper, we set L = 9000,resulting in a window of 20 seconds. Next, by aligning the anchor points, we obtainphase-rectified segments. The resultant segments are then averaged to obtain BPRSAsignal X . Consequently, we can interpret defections in X as coupling between mHR andfHR. Lastly, X is quantified within specific windows before and after the center of X .Accordingly, the designated windows are characterized as L + S L + S

2, and L − S L − S

1, where S S S S Hz .Fetal stress index (FSI) is a parameter defined to analyze the coupling between mHRand fHR using the BPRSA. This index is defined as the difference between the means ofthe two windows mentioned above, as follows: F SI = 1 S − S L + S (cid:88) i = L + S X ( i ) − S − S L − S (cid:88) i = L − S X ( i ) , (1)where index L at the center of X corresponds to our anchor definition (within thematernal RR intervals). Accordingly, the response of the fetus on mHR decreases ismeasured by FSI. We utilized an established self-supervised learning framework [5, 6] to learn robustrepresentations from our collected ECG data, which were further used to classify thelevel of stress, as well as to perform regression analyses. The framework consisted of2 stages of learning, the first stage consisted of learning ECG representations and thesecond stage consisted of learning affect attributes from the learned representations (seeFigure 4).

We utilized a multi-task convolutional architecture, henceforth referred to as the ‘trans-formation recognition network’, which consists of 3 convolutional blocks. Each blockconsists of two 1D convolution layers with leaky rectified linear unit (ReLU) activationfunctions, followed by a max pooling layer. Following the convolutional layers, a globalmax pooling is used. This is finally followed by the several parallel fully connected(FC) layers. We applied dropouts to reduce thee possibility of overfitting. A detaileddescription of this network’s architecture is given in supplementary material.In order to learn the ECG representation, the model was trained in a self-supervisedmanner. Automatic labels were generated through the following transformations:1. Noise addition: Random Gaussian noise is added to the raw ECG signal.2. Scaling: The magnitude of the original ECG is scaled. 8/15 . Signal Transforma � on Recogni � on Network C o n v B l o c k C o n v B l o c k C o n v B l o c k C o n v B l o c k C o n v B l o c k C o n v B l o c k F l a � e n i n g L a y e r F C L a y e r s F C L a y e r s Classi ﬁ ca � on taskRegressiontasks F C L a y e r s F C L a y e r s A u t o m a � c a ll y g e n e r a t e d l a b e l s Raw ECG Signals F l a � e n i n g L a y e r

2. A ﬀ ec � ve Recogni � on Network Frozen Network Figure 4.

Our deep learning approach using a self-supervised learning framework.3. Negation: The original ECG signal is flipped vertically.4. Temporal Inversion: The original ECG signal is flipped horizontally.5. Permutation: The raw ECG signal is first divided into smaller segments of equallength, which are then randomly shuffled across the time axis.6. Time-warping: ECG signals are first divided into smaller segments similar to thepermutation operation, These segments are then stretched or squeezed across thetime axis.The parameters of the above-mentioned transformations were derived from ourprevious work [6]. Next the transformed signals were stacked randomly to create theinput matrix for the self-supervised network, while the corresponding labels of thetransformations were stacked, in a similar order to the inputs, to create the output labels.Each of these transformation labels are used as an output to one of the FC layers toconstruct a multi-task network.

In the second stage, affective attributes were learned using the learned ECG represen-tations obtained from the self-supervised network. In this stage we classified stressfollowed by regression analysis of maternal hair cortisol, FSI, PDQ, and PSS values.The affect recognition network contains the similar convolutional layers as those usedin the self-supervised network, followed by fully connected layers. The weights of theconvolution layers are transferred from the signal transformation recognition networkand kept frozen, and only the fully connected layers are trained. Detailed descriptions ofthe architectures are mentioned in the supplementary material.

In order to explore the generalizability of the self-supervised method, we tackled thistask in two different ways. Our first approach was to use FELICITy dataset and trainthe framework from scratch. As the second approach, we utilized four publicly available9/15atasets to train the signal transformation network for learning ECG representations,followed by using FELICITy dataset to perform affect recognition by training thefully connected layers of the second network. The details of these two approaches arementioned below.

First Approach - Learning From FELICITy dataset:

As mentioned above, inour first approach, we utilized FELICITy dataset to train the self-supervised networkconsisting of both the signal transformation recognition network responsible for learningto extract ECG representations, as well as the fully connected layers of the affectrecognition network.

Second Approach - Transfer Learning From Public Datasets:

In order to explorethe generalizability of the self-supervised learning, we used 4 publicly available datasetsnamely, AMIGOS [21], DREAMER [22], SWELL [23], and WESAD [24] to train thesignal transformation recognition network, i.e., learn ECG representations. Next, wetransferred the weights of the network to the affect recognition network where we utilizedFELICITy dataset and collected labels to train the fully connected layers of thee networkso that stress can be classified and factors such as maternal hair cortisol, FSI, PDQ, andPSS values can be regressed. A brief description of the public datasets is provided insupplementary material.

We performed minimal pre-processing on the raw data. We re-sampled ECG signals toa sampling frequency to 256 Hz , followed by segmentation into 10-second windows asproposed by [6]. Next, to remove the noisy parts of aECG and mECG data, we utilizedthe SQI values available with the segments. To this end, SQI < ± https://code.engineering.queensu.ca/17ps21/ssl-ecg-v2 . We used the Shapiro–Wilk test to evaluate for normal distribution. Medians andinterquartile ranges were reported for skewed distributions, while the means and standarddeviations are reported for Gaussian distributions. Where data are categorical, we presentthe absolute and relative frequencies. Groups are compared using t-test for independentsamples, Mann–Whitney U test, and Pearson Chi-squared test.All of the statistical tests were performed two-sided with statistical significanceconsidered at p < Acknowledgements

We gratefully acknowledge the contribution of Dr. Hau-Tieng Wu lab with SAVERcode-driven mECG/fECG extraction. The project was developed and performed by ownresources of Frauenklinik/Klinikum rechts der Isar and funding from Hans Fischer SeniorFellowship to MCA.

Disclosures

MGF has a patent pending on aECG signal separation (WO2018160890).

References

1. Silvia M Lobmaier, Alexander M¨uller, C Zelgert, C Shen, PC Su, G Schmidt,B Haller, G Berg, B Fabre, J Weyrich, et al. Fetal heart rate variability respon-siveness to maternal stress, non-invasively detected from maternal transabdominalecg.

Archives of Gynecology and Obstetrics , 301(2):405–414, 2020.2. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.

Nature ,521(7553):436–444, 2015.3. Pritam Sarkar, Kyle Ross, Aaron J Ruberto, Dirk Rodenbura, Paul Hungler, andAli Etemad. Classification of cognitive load and expertise for adaptive simulationusing deep multitask learning. In , pages 1–7, 2019.4. Kyle Ross, Pritam Sarkar, Dirk Rodenburg, Aaron Ruberto, Paul Hungler, AdamSzulewski, Daniel Howes, and Ali Etemad. Toward dynamically adaptive simula-tion: Multimodal classification of user expertise using wearable devices.

Sensors ,19(19):4270, 2019.5. Pritam Sarkar and Ali Etemad. Self-supervised learning for ecg-based emotionrecognition. In

IEEE International Conference on Acoustics, Speech and SignalProcessing , pages 3217–3221, 2020.6. Pritam Sarkar and Ali Etemad. Self-supervised ecg representation learning foremotion recognition.

IEEE Transactions on Affective Computing , pages 1–1, 2020.7. Martin G Frasch, Silvia M Lobmaier, Tamara Stampalija, Paula Desplats,Mar´ıa Eugenia Pallar´es, Ver´onica Pastor, Marcela A Brocco, Hau-tieng Wu,Jay Schulkin, Christophe L Herry, et al. Non-invasive biomarkers of fetal braindevelopment reflecting prenatal stress: An integrative multi-scale multi-speciesperspective on data collection and analysis.

Neuroscience & Biobehavioral Reviews ,2018.8. Paula Desplats, Ashley M Gutierrez, Marta C Antonelli, and Martin G Frasch.Microglial memory of early life stress and inflammation: susceptibility to neurode-generation in adulthood.

Neuroscience & Biobehavioral Reviews , 2019. 11/15. Pritam Sarkar and Ali Etemad. Cardiogan: Attentive generative adversarialnetwork with dual discriminators for synthesis of ecg from ppg, 2020.10. Hyoyoung Jeong, John A Rogers, and Shuai Xu. Continuous on-body sensing forthe covid-19 pandemic: Gaps and opportunities.

Science Advances , 6(36):eabd4794,2020.11. Christophe L Herry, Helena MF Soares, Lavinia Schuler-Faccini, and Martin GFrasch. Heart rate variability monitoring identifies asymptomatic toddlers exposedto zika virus during pregnancy. arXiv preprint arXiv:1812.05259 , 2018.12. SJ Gordijn, IM Beune, B Thilaganathan, A Papageorghiou, AA Baschat, PN Baker,RM Silver, K Wynia, and W Ganzevoort. Consensus definition of fetal growthrestriction: a delphi procedure.

Ultrasound in Obstetrics & Gynecology , 48(3):333–339, 2016.13. Sheldon Cohen, Tom Kamarck, and Robin Mermelstein. A global measure ofperceived stress.

Journal of Health and Social Behavior , pages 385–396, 1983.14. Ruilin Li, Martin G Frasch, and Hau-Tieng Wu. Efficient fetal-maternal ecg signalseparation from two channel maternal abdominal ecg via diffusion-based channelselection.

Frontiers in Physiology , 8:277, 2017.15. Gail AA Cooper, Robert Kronstrand, and Pascal Kintz. Society of hair testingguidelines for drug testing in hair.

Forensic Science International , 218(1-3):20–24,2012.16. Silvia Iglesias, Dar´ıo Jacobsen, Diego Gonzalez, Sergio Azzara, Esteban M Repetto,Juan Jamardo, Sabrina Gar´ın G´omez, Viviana Mesch, Gabriela Berg, and BibianaFabre. Hair cortisol: A new tool for evaluating stress in programs of stressmanagement.

Life Sciences , 141:188–192, 2015.17. Diego Gonzalez, Dario Jacobsen, Carolina Ibar, Carlos Pavan, Jos´e Monti,Nahuel Fernandez Machulsky, Ayelen Balbi, Analy Fritzler, Juan Jamardo, Es-teban M Repetto, et al. Hair cortisol measurement by an automated method.

Scientific Reports , 9(1):1–6, 2019.18. Axel Bauer, Petra Barthel, Alexander M¨uller, Jan Kantelhardt, and Georg Schmidt.Bivariate phase-rectified signal averaging—a novel technique for cross-correlationanalysis in noisy nonstationary signals.

Journal of Electrocardiology , 42(6):602–606,2009.19. SM Lobmaier, Evelyn Annegret Huhn, S Pildner von Steinburg, Alexander M¨uller,Tibor Schuster, JU Ortiz, Georg Schmidt, and KT Schneider. Phase-rectifiedsignal averaging as a new method for surveillance of growth restricted fetuses.

The Journal of Maternal-Fetal & Neonatal Medicine , 25(12):2523–2528, 2012.20. Silvia M Lobmaier, Nico Mensing van Charante, Enrico Ferrazzi, Dino A Giussani,Caroline J Shaw, Alexander M¨uller, Javier U Ortiz, Eva Ostermayer, BernhardHaller, Federico Prefumo, et al. Phase-rectified signal averaging method topredict perinatal outcome in infants with very preterm fetal growth restriction-asecondary analysis of truffle-trial.

American Journal of Obstetrics and Gynecology ,215(5):630–e1, 2016.21. Juan Abdon Miranda Correa, Mojtaba Khomami Abadi, Niculae Sebe, and IoannisPatras. Amigos: A dataset for affect, personality and mood research on individualsand groups.

IEEE Transactions on Affective Computing , 2018. 12/152. Stamos Katsigiannis and Naeem Ramzan. Dreamer: A database for emotionrecognition through eeg and ecg signals from wireless low-cost off-the-shelf devices.

IEEE Journal of Biomedical and Health Informatics , 22(1):98–107, 2017.23. Saskia Koldijk, Maya Sappelli, Suzan Verberne, Mark A Neerincx, and WesselKraaij. The swell knowledge work dataset for stress and user modeling research.In

Proceedings of the 16th International Conference on Multimodal Interaction ,pages 291–298, 2014.24. Philip Schmidt, Attila Reiss, Robert Duerichen, Claus Marberger, and KristofVan Laerhoven. Introducing wesad, a multimodal dataset for wearable stress andaffect detection. In

Proceedings of the 20th ACM International Conference onMultimodal Interaction , pages 400–408, 2018.25. Shimmer ecg. [Online]. Available: . [Accessed: 2020-09-17].26. Tmsi-mobi. [Online]. Available: . [Ac-cessed: 2020-09-17].27. Respiban professional. [Online]. Available: . [Accessed: 2020-09-17]. 13/15 upplementary Materials

Network architectures

We utilize a popular convention to describe the CNN architectures. For example, cks2-fdenotes a convolution layer with kernel size 1 × k, stride 2, and f number of filters.mp8-s2 denotes a max-pool layer with a filter size of 8 and a stride 2. fcN denotes afully connected layer with N hidden nodes. We utilize leaky-ReLU activation functionsin all the convolution and fully connected layers, except the last layers, where sigmoidactivation functions are used for the classification and recognition networks, and directlogits are extracted during the regression tasks. Finally, P × [fcN] indicates P number ofparallel branches in the multi-task networks. Using this convention, the details of ourmodels are given below.Signal Transformation Recognition Network: c32s2-32, c32s2-32, mp8-s2, c16s2-64, c16s2-64, mp8-s2, c8s2-128, c8s2-128, global-max-pool, 7 × [fc128-dropout, fc128-dropout, fc1]. Affect Recognition Network (classification): c32s2-32, c32s2-32, mp8-s2, c16s2-64, c16s2-64, mp8-s2, c8s2-128, c8s2-128, global-max-pool, fc512, fc512, fc1.

Affect Recognition Network (regression): c32s2-32, c32s2-32, mp8-s2, c16s2-64, c16s2-64, mp8-s2,c8s2-128, c8s2-128, global-max-pool, 4 × [fc512, fc512, fc512,fc512, fc1] Prediction of stress biomarkers: DL regression task

In addition to Table 3, we further calculate mean absolute error (MAE) and root meansquare error (RMSE) of our self-supervised framework in predicting cortisol, FSI, PDQ,and PSS (Table 4).

Table 4.

MAE and RMSE values for prediction of biomarkers using self-supervisedlearning, on the FELICITy and public datasets.Task Source FELICITy dataset Public datasetsMAE RMSE MAE RMSECortisol aECG 37 . ± .

284 66 . ± .

209 16 . ± . ∗ . ± . ∗ mECG 16 . ± .

068 40 . ± .

662 6 . ± . ∗ . ± . ∗ FSI aECG 0 . ± .

025 0 . ± .

022 0 . ± . ∗ . ± . ∗ mECG 0 . ± .

138 0 . ± .

164 0 . ± . ∗ . ± . ∗ PDQ aECG 3 . ± .

316 5 . ± .

290 1 . ± . ∗ . ± . ∗ mECG 1 . ± .

624 2 . ± .

984 0 . ± . ∗ . ± . ∗ PSS aECG 3 . ± .

410 6 . ± .

335 1 . ± . ∗ . ± . ∗ mECG 1 . ± .

772 3 . ± .

016 0 . ± . ∗ . ± . ∗ ∗ Public versus FELICITy dataset, Mann Whitney U test. mECG versus aECG within the same dataset, Mann Whitney U test.

Statistical significance at p < .

025 accounting for two comparisons (using Bonferroni-Holm correction).

Description of Public Datasets

The key metrics of each dataset (AMIGOS [21], DREAMER [22], SWELL [23], andWESAD [24]) are summarized in Table 1 and are outlined in more detail below. It shouldbe noted that all the public datasets contain ECG data and corresponding emotional14/15round truth labels. However, the emotional labels were not used in this study giventhe use of our self-supervised approach with automatically generated labels.

AMIGOS [21]:

This dataset comprises ECG and emotional labels from 40 participants. Participantswere asked to watch different video clips (total 16) in order to elicit their emotionalstates. Shimmer ECG sensors [25] were used to record ECG at a sampling rate of 256 Hz . Finally, subjective arousal and valence scores were recorded on a scale of 1 to 9 atthe end of each session. DREAMER [22]:

The DREAMER dataset comprises data from 23 participants. The emotional responseswere elicited by watching emotional video clips. The clips induced different emotionssuch as amusement, calmness, anger, excitement, disgust among others. Similar toAMIGOS, DREAMER was also collected using Shimmer ECG sensors [25] at a samplingrate of 256 Hz . At the end of each session Self-Assessment Manikins (SAM) were usedto record arousal and valence scores on a scale of 1 to 5. SWELL [23]:

25 participants comprised this dataset, where ECG data and affect scores were collectedas participants performed different day-to-day office jobs, for example preparing reports,making presentations, and others. TMSI MOBI [26] devices were used in this study tocollect ECG signals at a sampling rate of 2048 Hz . Finally, self-reported affect scoreswere collected on a scale of 1 to 9 at the end of each session. WESAD [24]: