Exploring Automatic COVID-19 Diagnosis via voice and symptoms from Crowdsourced Data
Jing Han, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Cecilia Mascolo
EEXPLORING AUTOMATIC COVID-19 DIAGNOSISVIA VOICE AND SYMPTOMS FROM CROWDSOURCED DATA
Jing Han, Chlo¨e Brown ∗ , Jagmohan Chauhan ∗ , Andreas Grammenos ∗ Apinan Hasthanasombat ∗ , Dimitris Spathis ∗ , Tong Xia ∗ , Pietro Cicuta, Cecilia Mascolo University of [email protected]
ABSTRACT
The development of fast and accurate screening tools, which couldfacilitate testing and prevent more costly clinical tests, is key tothe current pandemic of COVID-19. In this context, some initialwork shows promise in detecting diagnostic signals of COVID-19from audio sounds. In this paper, we propose a voice-based frame-work to automatically detect individuals who have tested positivefor COVID-19. We evaluate the performance of the proposed frame-work on a subset of data crowdsourced from our app, containing828 samples from 343 participants. By combining voice signals andreported symptoms, an AUC of . has been attained, with a sensi-tivity of . and a specificity of . . We hope that this study opensthe door to rapid, low-cost, and convenient pre-screening tools to au-tomatically detect the disease. Index Terms — COVID-19, Crowdsourced data, Speech analy-sis, Symptoms analysis
1. INTRODUCTION
On 11 March 2020, the World Health Organisation announced theCOVID-19 outbreak as a global pandemic. At the time of writingthis paper, more than 37 million confirmed COVID-19 cases andone million deaths globally have been reported. Nowadays, in ad-dition to developing drugs and vaccines for treatment and protec-tion [1, 2], scientists and researchers are also investigating primaryscreening tools that ideally should be accurate, cost-effective, rapid,and meanwhile easily accessible to the mass at large.Amongst the efforts towards rapid screening [3, 4], audio-based diagnosis appears promising, mainly due to its non-invasiveand ubiquitous character, which would allow for individual pre-screening ‘anywhere’, ‘anytime’, in real-time, and available to‘anyone’ [5]. Many applications have been developed for monitor-ing health and wellbeing in recent times via intelligent speech andsound analysis [6, 7, 8].COVID-19 is an infectious disease, and most people infectedwith the COVID-19 experience mild to moderate respiratory ill-ness [9]. More specifically, on the one hand, COVID-19 symptomsvary widely, such as cough, dyspnea, fever, headache, loss of tasteor smell, and sore throat [10]. On the other hand, however, manysymptoms are associated with and hence can be recognised viaspeech and sound analysis. Such symptoms include shortness ofbreath, dry or wet cough, dysphonia, fatigue, to name but a few.As a consequence, most recently several research works have beenpublished, aiming at providing sound-based automatic diagnosticsolutions [4, 11, 12]. ∗ Ordered alphabetically, equal contribution. This work was supportedby ERC Project 833296 (EAR).
In this paper, we propose machine learning models for voice-based COVID-19 diagnosis. More specifically, we analyse a subsetof data from 343 participants crowdsourced via our app, and showthe discriminatory power of voice for the diagnosis. We demon-strate how voice can be used as signal to distinguish symptomaticpositive tested individuals, from non-COVID-19 (tested) individu-als, who also have developed symptoms akin to COVID-19. Wefurther show performance improvement by combining sounds andsymptoms for the diagnosis, yielding a specificity of . and anAUC of . .
2. RELATED WORK
With the advent of COVID-19, researchers have started to exploreif respiratory sounds could be diagnostic [5]. For instance, in [4],breathing and cough sounds have been targeted and researchersdemonstrate that COVID-19 individuals are distinguishable fromhealthy controls as well as asthmatic patients. In [13], an in-terpretable COVID-19 diagnosis framework has been devised todistinguish COVID-19 cough from other types of cough. Likewise,in [12], a detectable COVID-19 signature has been found fromcough sounds and can help increase the testing capacity.However, none of the aforementioned efforts have analysed thepotential of voice. Recently, the feasibility for COVID19 screeningusing voice has been introduced in [14]. Similarly, in [15], signifi-cant differences in several voice characteristics are observed betweenCOVID-19 patients and healthy controls. Moreover, in [16], speechrecordings from hospitalised COVID-19 patients are analysed to cat-egorise their health state of patients. Our work differs from theseworks, as we utilise an entirely crowdsourced dataset, for which wehave to deal with the complexity of the data such as recordings indifferent languages and varied environmental noises. Furthermore,we jointly analyse the voice samples and symptoms metadata, andshow that better performance can be obtained by combining them.Our study confirms that even in the most challenging scenario ofin-the-wild crowdsourced data, voice is a promising signal for thepre-screening of COVID-19.
3. METHODOLOGY
This section presents a comprehensive description spanning the dataacquisition, preprocessing, and tasks of interest. We note that thedata collection and study have been approved by the Ethics Commit-tee of the Department of Computer Science and Technology at theUniversity of Cambridge. a r X i v : . [ c s . S D ] F e b a) Symptoms (b) Voice Recording Fig. 1 : Screenshots of the COVID-19 Sounds App when (a) report-ing symptoms, and (b) recording voice samples.
The crowdsourced data is collected via our “COVID-19 SoundsApp ”. It has three versions: web-based, Android, and iOS, withan aim to reach a high number of users while maintaining theiranonymity. When using the app, users are asked to record and sub-mit their breathing, coughing, and voice samples, report symptomsif any, and provide some basic demographic and medical informa-tion. Moreover, it also asks users if they have been tested positive(or negative) for the virus, and if they are in hospital. For moredetails of our data collection framework, the reader is referred to [4].Fig. 1 illustrates some symptom- and voice-collection screens fromthe iOS app.As of 14th October 2020, data from 13722 unique users (4690from the web app, 6334 from Android, and another 2698 from iOS)were collected. In this study, we explore data from two groupsof participants, i. e., users who declared having tested positive forCOVID-19, and those who tested negative. As a consequence, datafrom 343 participants were selected for our analysis. In particular,140 participants were tested positive, 199 tested negative, one tran-sitioned from being initially positive to negative later, and anotherthree transitioned the other way round: negative to positive.Note that in our selected subset of data, similar to the positiveparticipants, negative participants declared their symptoms to vary-ing extents as well. Likewise, there are asymptomatic positive par-ticipants who selected “None” when asked about their symptoms. Acomparison of the percentage occurrence of 11 symptoms (“None”excluded) between positive and negative participants is depicted inFig. 2. It appears that loss of smell or taste is more frequently re-ported among positive participants than negative ones, while the dif-ferences of the percentage occurrence is rather small between posi-tive and negative participants across other reported symptoms. Recently, the potential of respiratory sounds for COVID-19 diag-nosis has been explored in our previous work as well as by otherresearchers. However, few research works have yet investigated thepossibility of detecting Covid-19 infection from voice. In this study,we focus on voice-based analysis for disease diagnostics, and theperformed data preprocessing workflow is detailed as follows.First, all voice recordings from the selected users were convertedto mono signals with a sampling rate of 16 kHz. Moreover, record-ings that do not contain any speech signal were discarded. Then,we considered applying segmentation. As mentioned previously,each recording consists of multiple repetitions of the given sentenceby the same user, varying from one to three times. However, inour preliminary analysis, we noticed that the effect of segmentationwas negligible, and that segmentation might eliminate the possiblebreathing differences and temporal dynamics between repetitions.For this reason, we retained only unsegmented samples for furtheranalysis, while trimming the leading and trailing silence from eachrecording as in [4]. Lastly, audio normalisation was applied to eachrecording, aiming at eliminating the volume inconsistency acrossparticipants caused by varied devices or different distances betweenthe mouth and the microphone.After preprocessing, we obtained a total of 828 voice samples(326 positive and 502 negative) from 343 participants. They mostlycome from the UK, Portugal, Spain, and Italy.
In this study, a series of binary classification tasks are developed.In particular, based on the dataset collected, we train models for thefollowing clinically meaningful tasks:•
Task 1 : Distinguish individuals who have declared that theywere tested positive for COVID-19, from individuals whohave declared that they were tested negative for COVID-19.This is a general scenario, and we refer to this task as ‘
Pos.v.s. Neg. ’• Task 2 : Distinguish individuals tested positive for COVID-19recently in the last 14 days, from individuals tested negativefor COVID-19, specifically for those with a negative test andno reported symptoms. We refer to this task as ‘ newPos. v.s.Neg. w/o sym. ’ This case is set following our previous workin [4], so as to compare the capability of voice samples withbreathing and cough ones for COVID-19 diagnosis.•
Task 3 : Distinguish asymptomatic individuals tested positivefor COVID-19, from individuals tested negative, specificallyfor those healthy controls that do not have any symptom. Thistask is devised to investigate whether asymptomatic carriersof the disease can be identified from their voice. This is ofconcern given the high rate of asymptomatic infection re-ported in the population [17]. Therefore, identifying asymp-tomatic individuals may play a significant role in controllingthe ongoing pandemic [17]. We refer to this task as ‘
Pos. w/osym. v.s. Neg. w/o sym. ’• Task 4 : Distinguish symptomatic individuals who have de-clared that they were tested positive for COVID-19 and havedeveloped at least one symptom, from individuals who havedeclared that they were tested negative though suffering fromone or more symptoms. This task is considered with an aimto understand the feasibility of voice analysis to differentiateCOVID-19 from other disease such as the common-flu. Werefer to this task as ‘
Pos. w/ sym. v.s. Neg. w/ sym. ’ .0 0.1 0.2 0.3 0.4percentage occurrencewet coughtightness in chestsore throatloss of smell/tasteshort of breathmuscleacheheadachefeverdry coughdizzinesschills positivenegative Fig. 2 : Comparison of the percentage of occurrence across 11 symp-toms between COVID-19 positive and negative participants.In addition to voice-based analysis, we explore the symptoms toprovide complementary information. In particular, for symptomaticindividuals, their voice and the symptoms are integrated as inputs tothe models. More specifically, another three tasks are investigated:• S only : Distinguish symptomatic positive individuals fromsymptomatic negative users by using their symptoms only .• (V+S) FF : Distinguish symptomatic positive individuals fromsymptomatic negative users via feature-level fusion by con-catenating voice features and symptom-based features as in-puts of a model.• (V+S) DF : Distinguish symptomatic positive individualsfrom symptomatic negative users via decision-level fusion by combining the predictions from a voice-based model andanother symptom-based model. In our case, the final decisionwill be the same as the prediction from the model with thehighest probability estimate for a given instance.
4. EXPERIMENTS
In this section, a comprehensive evaluation is performed to investi-gate the performance of the tasks provided in 3.3. We describe thefeatures, experiment setup, and result analysis, respectively.
In this study, we applied an established acoustic feature set, namelythe INTERSPEECH 09 Computational Paralinguistics Challenge(C OM P AR E) set [18], extracted by an open-source openSMILEtoolkit [19]. For each audio file, 12 functionals were applied on 16frame-level descriptors and their corresponding delta coefficients,resulting in a total of 384 features. Particularly, the 16 frame-level descriptors chosen are Zero-Crossing-Rate (ZCR), Root MeanSquare (RMS) frame energy, pitch frequency (F0), Harmonics-to-Noise Ratio (HNR), and Mel-Frequency Cepstral Coefficients(MFCCs) 1-12, covering prosodic, spectral, and voice quality fea-tures [18]. For more details about these features, please refer to [18].Moreover, we combined the voice-based analysis with symp-toms for COVID-19 diagnosis. In specific, 11 symptoms are chosenas the most common symptoms of COVID-19, as shown in Fig. 2. In order to convert these symptoms into feature vectors, one-hot en-coding was utilised, resulting in a 11-dimensional symptom-basedfeature vector for each sample. Each dimension of the vector indi-cates the presence (1) or absence (0) of a particular symptom.
Following feature extraction, we used Support Vector Machines(SVMs) with linear kernel as the classifiers for all tasks, due to itswidespread usage and robust performance achieved in intelligentaudio analysis [18, 20]. The complexity parameter C was set to 0.01based on our preliminary research. Code was implemented usingthe scikit-learn library in Python.Moreover, for each task, 5-fold cross-validation was performedwhile the subject-independent constraint was kept, ensuring that datapoints from the same participant do not appear in both splits. Further,to deal with the imbalanced data during training, data augmentationvia Synthetic Minority Oversampling Technique (SMOTE) [21] wascarried out to create synthetic observations of the minority class.To validate the recognition performance of the voice-based mod-els for disease diagnosis under various scenarios, we selected thefollowing standard evaluation metrics: sensitivity (also know as re-call or true positive rate (TPR) and calculated as
T P/ ( T P + F N ) ),specificity (also referred to as true negative rate (TNR) and calcu-lated as T N/ ( T N + F P ) ), the area under the ROC curve (ROC-AUC) which measures the performance by consider both sensitivityand specificity at various probability thresholds, and the area un-der precision-recall curve (PR-AUC) which computes the area underthe precision-recall curve. Moreover, for each model, the mean andstandard deviation across all five folds were computed separately. Experiment results are presented in Table 1. For Task 1, when dis-tinguishing positive tested individuals from negative ones withouttaking their symptoms into account, the model achieves a sensitiv-ity and specificity of , respectively. Further, when distin-guishing recently tested positive individuals from healthy controlswithout any symptoms, the ROC-AUC and PR-AUC both increasefrom around to , while the sensitivity and specificity areimproved from to , and from to . This indicatesthat voice signals have a detectable COVID19 signature. Besides,in [4], the analysis based on cough and breathing sounds, achievedthe sensitivity of and ROC-AUC of , on a different sub-set of users though. It can be seen that the obtained performancefrom human voice is quite comparable to cough and breathing forCOVID-19 diagnosis. Hence it would be interesting to analyse allthree sounds jointly to understand a comprehensive overview.Next, when distinguishing asymptomatic patients from healthycontrols (Task 3), we observe a noticeable performance decreaseof the sensitivity from to , indicating that a high rate ofasymptomatic patients are misclassified as healthy participants. TheROC-AUC also drops from to . This is in alignment withfindings in a recent study [12], where researchers achieved inROC-AUC when identifying COVID-19 coughs from asymptomaticindividuals. It implies that with the current features and model, it isdifficult to identify asymptomatic patients just from their voice.However, when distinguishing symptomatic COVID-19 patientsfrom non-COVID-19 controls who also developed similar symptoms(Task 4), our model achieves better performance than Task 3, at-taining an AUC of . It demonstrates the potential of exploitingvoice to serve as a primary screening tool. Such a tool could rapidly able 1 : Performance in terms of sensitivity( SE ), specificity( SP ), receiver operating characteristic - area under curve ( ROC - AUC ), andarea under precision-recall curve(
P R - AUC ) for the voice-based diagnosis. For each measurement, its mean and standard deviation across5-fold cross-validation are reported.
Task ± stdSE SP ROC-AUC PR-AUC
1. Pos. v.s. Neg. 326 502 0.62 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± T r u e p o s i t i v e r a t e Receiver Operating Characteristic Curve
ROC fold 1 (AUC = 0.68)ROC fold 2 (AUC = 0.82)ROC fold 3 (AUC = 0.88)ROC fold 4 (AUC = 0.73)ROC fold 5 (AUC = 0.75)ChanceMean ROC (AUC = 0.77 ± 0.07)± 1 std. dev. (a) feature-level fusion T r u e p o s i t i v e r a t e Receiver Operating Characteristic Curve
ROC fold 1 (AUC = 0.67)ROC fold 2 (AUC = 0.82)ROC fold 3 (AUC = 0.89)ROC fold 4 (AUC = 0.78)ROC fold 5 (AUC = 0.77)ChanceMean ROC (AUC = 0.79 ± 0.07)± 1 std. dev. (b) decision-level fusion
Fig. 3 : ROC curves of COVID-19 diagnosis from the combination of voice and reported symptoms, via ( a ) feature-level fusion and ( b )decision-level fusion. In each figure, the curve of each fold under the 5-fold cross-validation is shown separately. The mean ROC curves andtheir variances are also given. Table 2 : Results of COVID-19 diagnosis based on voice ( V )and symptoms ( S ), where the selected positive and negative par-ticipants report at least one symptom. Performance in terms ofsensitivity( SE ), specificity( SP ), receiver operating characteristic- area under curve ( ROC - AUC ), and area under precision-recallcurve (
P R - AUC ) are reported. For each measurement, its meanand standard deviation across 5-fold cross-validation are reported.Best performances are highlighted.
Method mean ± stdSE SP ROC-AUC PR-AUCV only ± ± ± ± S only ± ± ± ± V + S ) FF ± ± ± ± V + S ) DF ± ± ± ± in ROC-AUC and PR-AUC, in sensitivity and in specificity. It shows the promise ofcombining voice and symptoms in our analysis. However, note thatperformance varies across folds, leading to a wide standard deviationof our models. This can also be seen from the ROC curves displayedin Fig. 3. It is believed that with more training data, it can be allevi-ated, as shown in our previous work [4].
5. CONCLUSIONS AND FUTURE WORK
In this paper, voice-based models are proposed to discriminateCOVID-19 positive cases from healthy controls. The effectivenessof our models are evaluated on a crowdsourced dataset, and high-lights the great potential of developing an early-stage screening toolbased on voice signals for disease diagnosis. In addition to voiceanalysis, this work further explores fusion strategies to combinevoice and reported symptoms which yield encouraging results.For future work, we plan to incorporate other sounds such asbreathing and coughing alongside voice. In addition, we will inves-tigate the impact of the disease on voice by analysing the correlationof voice characteristics before and after the infection. Furthermore,our data collection is ongoing, and we will improve the robustnessof our models by training on a larger pool of users. . REFERENCES [1] Nicole Lurie, Melanie Saville, Richard Hatchett, and Jane Hal-ton, “Developing COVID-19 vaccines at pandemic speed,”
New England Journal of Medicine , vol. 382, no. 21, pp. 1969–1973, May 2020.[2] James M Sanders, Marguerite L Monogue, Tomasz Z Jod-lowski, and James B Cutrell, “Pharmacologic treatments forcoronavirus disease 2019 (COVID-19): A review,”
JAMA , vol.323, no. 18, pp. 1824–1836, Apr. 2020.[3] Xueyan Mei, Hao-Chih Lee, Kai-yue Diao, et al., “Artificialintelligence–enabled rapid diagnosis of patients with COVID-19,”
Nature Medicine , vol. 26, no. 8, pp. 1224–1228, Aug.2020.[4] Chlo¨e Brown, Jagmohan Chauhan, Andreas Grammenos, JingHan, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia,Pietro Cicuta, and Cecilia Mascolo, “Exploring automatic di-agnosis of COVID-19 from crowdsourced respiratory sounddata,” in
Proc. ACM SIGKDD International Conference onKnowledge Discovery & Data Mining (KDD) , San Diego, CA,2020, pp. 3474–3484.[5] Bj¨orn Schuller, Dagmar Schuller, Kun Qian, Juan Liu,Huaiyuan Zheng, and Xiao Li, “COVID-19 and computer au-dition: An overview on what speech & sound analysis couldcontribute in the sars-cov-2 corona crisis,” arXiv preprintarXiv:2003.11117 , 2020.[6] Yanzhi Ren, Chen Wang, Jie Yang, and Yingying Chen, “Fine-grained sleep monitoring: Hearing your breathing with smart-phones,” in
Proc. IEEE Conference on Computer Communica-tions (INFOCOM) , Hong Kong, China, 2015, pp. 1194–1202.[7] Zixing Zhang, Jing Han, Kun Qian, Christoph Janott, YananGuo, and Bj¨orn Schuller, “Snore-GANs: Improving automaticsnore sound classification with synthesized data,”
IEEE Jour-nal of Biomedical and Health Informatics , vol. 24, no. 1, pp.300–310, Jan. 2020.[8] Zhaocheng Huang, Julien Epps, and Dale Joachim, “Speechlandmark bigrams for depression detection from naturalisticsmartphone speech,” in
Proc. IEEE International Confer-ence on Acoustics, Speech and Signal Processing (ICASSP) ,Brighton, UK, 2019, pp. 5856–5860.[9] Wei-jie Guan, Zheng-yi Ni, Yu Hu, et al., “Clinical charac-teristics of coronavirus disease 2019 in China,”
New EnglandJournal of Medicine , vol. 382, no. 18, pp. 1708–1720, Apr.2020.[10] Carole H Sudre, Karla Lee, Mary Ni Lochlainn, et al., “Symp-tom clusters in COVID19: A potential clinical prediction toolfrom the COVID symptom study app,”
MedRxiv , 2020. [11] Ali Imran, Iryna Posokhova, Haneya N Qureshi, Usama Ma-sood, Sajid Riaz, Kamran Ali, Charles N John, and Muham-mad Nabeel, “Ai4COVID-19: Ai enabled preliminary diag-nosis for COVID-19 from cough samples via an app,” arXivpreprint arXiv:2004.01275 , 2020.[12] Piyush Bagad, Aman Dalmia, Jigar Doshi, Arsha Nagrani,Parag Bhamare, Amrita Mahale, Saurabh Rane, Neeraj Agar-wal, and Rahul Panicker, “Cough against COVID: Evidenceof COVID-19 signature in cough sounds,” arXiv preprintarXiv:2009.08790 , 2020.[13] Ankit Pal and Malaikannan Sankarasubbu, “Pay attention tothe cough: Early diagnosis of COVID-19 using interpretablesymptoms embeddings with cough sound signal processing,” arXiv preprint arXiv:2010.02417 , 2020.[14] Gadi Pinkas, Yarden Karny, Aviad Malachi, Galia Barkai,Gideon Bachar, and Vered Aharonson, “SARS-CoV-2 de-tection from voice,”
IEEE Open Journal of Engineering inMedicine and Biology , Sep. 2020, 8 pages.[15] Maral Asiaee, Amir Vahedian-azimi, Seyed Shahab Atashi,Abdalsamad Keramatfar, and Mandana Nourbakhsh, “Voicequality evaluation in patients with COVID-19: An acousticanalysis,”
Journal of Voice , Oct. 2020, 7 pages.[16] Jing Han, Kun Qian, Meishu Song, Zijiang Yang, Zhao Ren,Shuo Liu, Juan Liu, Huaiyuan Zheng, Wei Ji, Tomoya Koike,Xiao Li, Zixing Zhang, Yoshiharu Yamamoto, and Bj¨orn W.Schuller, “An early study on intelligent analysis of speech un-der COVID-19: Severity, sleep quality, fatigue, and anxiety,”in
Proc. INTERSPEECH , Shanghai, China, 2020, 5 pages.[17] Daniel P Oran and Eric J Topol, “Prevalence of asymptomaticSARS-CoV-2 infection: A narrative review,”
Annals of Inter-nal Medicine , vol. 173, no. 5, pp. 362–367, June 2020.[18] Bj¨orn Schuller, Stefan Steidl, and Anton Batliner, “The IN-TERSPEECH 2009 emotion challenge,” in
Proc. INTER-SPEECH , Brighton, UK, 2009, pp. 312–315.[19] Florian Eyben, Martin W¨ollmer, and B¨orn Schuller, “openS-MILE – the Munich versatile and fast open-source audio fea-ture extractor,” in
Proc. ACM International Conference onMultimedia (MM) , Florence, Italy, 2010, pp. 1459–1462.[20] Jing Han, Zixing Zhang, Fabien Ringeval, and Bj¨orn Schuller,“Prediction-based learning for continuous emotion recognitionin speech,” in
Proc. IEEE International Conference on Acous-tics, Speech and Signal Processing (ICASSP) , New Orleans,LA, 2017, pp. 5005–5009.[21] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, andW Philip Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,”