CareCall: a Call-Based Active Monitoring Dialog Agent for Managing COVID-19 Pandemic
Sang-Woo Lee, Hyunhoon Jung, SukHyun Ko, Sunyoung Kim, Hyewon Kim, Kyoungtae Doh, Hyunjung Park, Joseph Yeo, Sang-Houn Ok, Joonhaeng Lee, Sungsoon Lim, Minyoung Jeong, Seongjae Choi, SeungTae Hwang, Eun-Young Park, Gwang-Ja Ma, Seok-Joo Han, Kwang-Seung Cha, Nako Sung, Jung-Woo Ha
CCareCall: a Call-Based Active Monitoring Dialog Agentfor Managing COVID-19 Pandemic
Sang-Woo Lee * 1
Hyunhoon Jung * 1
SukHyun Ko Sunyoung Kim Hyewon Kim Kyoungtae Doh Hyunjung Park Joseph Yeo Sang-Houn Ok Joonhaeng Lee Sungsoon Lim Minyoung Jeong Seongjae Choi SeungTae Hwang Eun-Young Park Gwang-Ja Ma Seok-Joo Han Kwang-Seung Cha Nako Sung Jung-Woo Ha Abstract
Tracking suspected cases of COVID-19 is crucialto suppressing the spread of COVID-19 pandemic.Active monitoring and proactive inspection are in-dispensable to mitigate COVID-19 spread, thoughthese require considerable social and economicexpense. To address this issue, we introduce Care-Call, a call-based dialog agent which is deployedfor active monitoring in Korea and Japan. We de-scribe our system with a case study with statisticsto show how the system works. Finally, we dis-cuss a simple idea which uses CareCall to supportproactive inspection.
1. Introduction
The situation of COVID-19 pandemic has been so seriousthat there exist more than 7.5M patients and 420k deadpeople by early June. Under this serious situation, trackingpatient spread is crucial to mitigate COVID-19 pandemic.In particular, individual quarantine and large-scale activemonitoring are known to be significantly effective in thepandemic mitigation (Organization et al., 2020; Weisslederet al., 2020). To maintain social quarantine and gather symp-toms in real-time, however, enormous social and economicexpenses are required due to the considerable cost of test-ing performed by a restricted number of medical staff withlimited infection testing kits. Furthermore, continuous mon-itoring by humans might harm the medical staffs’ mentalhealth (Greenberg et al., 2020).Mobile applications (apps) are a prevalent solution for activemonitoring and individual quarantine for large-scale poten-tial patients in most countries (Peak et al., 2020). Althoughmobile app-based monitoring is effective, it has some limi- * Equal contribution NAVER Corp. Seongnam City. Corre-spondence to: Nako Sung < [email protected] > , Jung-Woo Ha < [email protected] > . tations. App-based monitoring assumes that the monitoringsubjects are familiar with using apps on mobile devices.However, considering that the fatality rate is much higher inolder people who are likely to be less familiar with mobiledevices (Shim et al., 2020). It is also noticeable that only37.8% over-70s in Korea have smartphones according to thesurvey in 2018. Thus, the assumption of familiarity on appmight hinder the efficacy of the app-based solutions.To address this issue, we design a call-based artificial in-telligence (AI) dialog agent system for monitoring peoplewho contacted injected patients. i.e., CareCall. CareCallconsists of three main modules including phone-based auto-matic speech recognition (ASR) (Chan et al., 2016; Ha et al.,2020), natural language understanding (NLU), and speechsynthesis (Song et al., 2019), thus calling active monitoringsubjects twice per day to check and gather the core symp-toms of whether they feel fever and respiratory pains withsimple yes or no questions. Because the user experienceof CareCall is similar to the human phone-based conversa-tion, our system can track the status of the potential patientsmore effectively. For enhancing the success ratio, we em-ploy a human-in-the-loop approach in operating our systemby monitoring the conversations and adjusting uncertainutterance cases.We have operated our CareCall system to track active mon-itoring subjects who contacted the COVID-19 patients inKorea and Japan for three months. In this paper, we intro-duce a case-study in Seongnam-si, Korea, to show how thesystem works. In the case-study, CareCall shows 0.9% asfalse positive and one case as a false negative. In addition,our human-in-the-loop process remarkably reduces the falsepositive rate from 1.95% to 0.72% when comparing betweencall cases of the first month and the next two months. Fur-thermore, we discuss some ideas on how to measure thespreading of the pandemic in the local community based onour system and describe the remaining or newly emergingtechnical challenges that we are facing, in aspects of thefunctionalities of call-based AI dialog systems to improveour system. a r X i v : . [ c s . H C ] A ug areCall: a Call-Based Active Monitoring Dialog Agent for Managing COVID-19 Pandemic Figure 1.
Schematic flow of CareCall system for active monitoring of self-quarantining people.
2. Related Work
COVID-19 pandemic has been a critical issue to signif-icantly affect political, social, and economic situationsbeyond health area around the world. Basically, becauseCOVID-19 is a variant of Corona virus, it is not trivial todistinguish its symptom from respiratory diseases such as acold. It is frequently reported that the COVID-19 patientssuffer from loss of taste and smell functions, which allevi-ates the difficulty of finding COVID-19 infected patients(Gane et al., 2020; Sungnak et al., 2020). However, while65% of positive cases on COVID-19 show loss of taste orsmell function, 22% of negative cases also suffer from thesame symptom (Sungnak et al., 2020).Since infectious disease pandemic leads to enormous lossof various areas, medical scientists have tried to predictepidemic and pandemic for several tens of years. Recentadvancement of IT also has enabled to predict diseaseswithout medical equipment such as forecasting flu epidemicfrom Google search data (Helft, 2008; Dugas et al., 2012).However, we cannot assure that keyword search action itselfmeans that they are injected by a disease.Recently, some apps based on mobile devices are prevalentto track people who contacted COVID-19 patients, who aresubjects of social quarantine or active monitoring (Menniet al., 2020). This work made the app to log health status ofindividuals, and used 2.6 million people in the UK and theUS had been regularly logged their health status. Amongthese, 18 thousand people reported having had a test forcoronavirus, with 7 thousand testings positive. This app-based tracking approach is more effective than conventionalsearch-based methods because the report via app is an ex-plicit action on the disease. However, app-based systemsmight be ineffective for older people who are not familiarwith using mobile devices. Whereas, CareCall uses phone-based communication incorporating speech-based AI dialogmodels, thus providing more accessibility to older people.
User experiences of call-based tracking systems are differentfrom app-based solutions. Even if the gathered data via app-based systems are likely to be less noisy, call-based systemcan cover more target subjects who do not use smart deviceswhich are necessary for app operation. In particular, thissituation is more serious for older people. Also, a call-basedcommunication is much easier than using apps. However, acall-based system requires more expensive operation costsdue to hiring human communicators. Under the pandemicwhere the monitored people dramatically increase, the costexponentially increases to track all the subjects. Recent ad-vancement of deep learning has remarkably enhanced theperformances of automatic speech recognition (Nassif et al.,2019), natural language processing (Devlin et al., 2019; Adi-wardana et al., 2020), and speech synthesis (Anumanchipalliet al., 2019), thus resulting in commercializing call-basedAI system (Leviathan & Matias, 2018). This call-based AIsystem can deal with many subjects in practical time as analternative to an app-based solution.
3. CareCall
Our system is a relatively simple task-oriented dialog (TOD)system. Our natural language understanding (NLU) modeldirectly classifies binary slot, and the result corresponds tothe system act of the system. There are two explicit slots inour dialog. One is whether the callee has a fever, the otheris whether the callee has any respiratory disease. The agentalso asks the type of specific symptoms when the calleereported any respiratory disease. However, in this case, thedialog system does not explicitly extract the slot, and justsend a dialog log to public servants of the public healthcenter. We use the method similar to the M2M approach togather data to train our NLU model (Shah et al., 2018).Why is it sufficient despite our dialog system is simple?There are four reasons. First, the core information we need areCall: a Call-Based Active Monitoring Dialog Agent for Managing COVID-19 Pandemic
Table 1.
Turn-level false negative and false positive from rates
March April-JuneCount Ratio Count RatioFalse negative 0 0.00% 1 0.00%False positive 88 1.95% 169 0.72%Total turns 4,508 100.00% 23,300 100.00%to extract is simple. The primary goal of our agent is tocheck the cases where the callee has no symptoms related toCOVID-19. Second, in our service situations, if the calleereports any symptoms, the human inspector should read theautomatic speech recognition (ASR) dialog log or call againto the callee to ask symptoms again and guide when andwhere he should go to medical check. Third, the systemwas required to be made as soon as possible. We are re-quested to make the whole system within three days. Lastly,it is required to make the number of false positive casesto be minimized. It is critical that the callee reported thesymptoms but the dialog system did not catch.
Figure 1 describes active monitoring using CareCall. Unlikeother TOD systems, it is extremely important to minimizefalse negative cases. In addition to the recognized reports ofsymptoms, we also use top-1 softmax value and the numberof dialog turn as features, and utilize an additional rule-based uncertainty detector with a threshold. In other words,not only the system recognizes the callee report symptoms,but also the system think the inference of the system itselfis uncertain, CareCall send a log to public servants. The logis kept for one month before being discarded.
It is challenging to make an extremely accurate dialog sys-tem using a handful of dataset prepared in the system build-ing phase. For example, older people have different speechpattern and pronunciation to other groups, with make speechrecognition and language recognition challenging.We can quickly boost the performance of our system usingthe failed ASR and NLU results by human-in-the-loop. Theconcept of active learning can be used. Similar to activemonitoring, uncertainty for the utterance, like top-1 softmaxprobability (Hendrycks & Gimpel, 2016) of Bayesian-styleuncertainty (Siddhant & Lipton, 2018), and uncertainty inthe dialog-level can be used to find data to be labeled. Bylabeling a small amount of data active learning suggested,We can easily decrease the error of ASR and NLU module.
Table 2.
Conversation examples between CareCall and subjects.U1 refers to cooperative people. U2 refers to people who is hardto interact in the monitoring conversation.Speakers UtterancesSystem Hello. I’m calling to check your symptoms re-garding to COVID-19 today. Have you got aminute to talk?User U1 Hello?U2 Yeah, oh, you have already called me earliertoday, huh? Yes, go ahead.System Hello again. Is this a good time to talk?User U1 Yes.U2 Yes.System Do you have a fever now?User U1 No.U2 Yeah. Nothing like that. I’ll let you know ifthere’s anything like that. Oh. Too stressful.System Okay. Do you have a cough or symptoms likeshortness of breath now? Please answer yes orno.User U1 No. I don’tU2 I am totally fine. Please do not worry.System Okay. Got it. When you want to go outside, besure to wear your mask. If you think you haveany suspect symptoms, please contact the publichealth center. Thank you.
4. CareCall Operation Analysis
We have released CareCall to monitor the citizens whosesymptoms related to COVID-19 are needed to be checked.CareCall has been operating in Seongnam-si, Korea, sinceMarch 2020. Our system helps reduce the burden of monitor-ing work on nearly one-thirty of the total monitoring needs.Figure 2 shows the actual confirmed cases of COVID-19 inSeongnam-si. Since March 9th, 142 confirmed cases havebeen reported and one patient has died among them. Weanalyzed data from CareCall to improve the performance ofour dialog system.All data including quantitative data and call logs from Care-Call were analyzed to understand the interaction between thesubjects and our system. Call hang-up rate by the subjectsbefore completing the conversation is 14.6%, and connec-tion failure rate is 7.3%. Those two rates are relatively lowbecause monitored subjects are responsible for receiving themonitoring call.We also investigated turn-level errors in the entire monitor-ing cases. Our target data were logs from a total of 13,904calls. We analyzed each turn-level false negative (FN) andfalse positive (FP) cases (see Table 1). The false negativecase means that monitored subjects report the COVID-19symptoms but CareCall does not confirm it, and this casecould be critical but only one case occurred for three months.This case was escalated to human monitor of COVID-19in Seongnam-si. On the contrary, the false positive meansCareCall detects the symptoms of monitored subjects al- areCall: a Call-Based Active Monitoring Dialog Agent for Managing COVID-19 Pandemic
Figure 2.
Confirmed cases in Seongnam-si until Jun 15th. March9th was the critical period for monitoring subjects in Seongnam-si. CareCall released at the right timing to reduce the burden ofmonitoring case. though they have no symptom; the false positive rate (FPR)is 0.92%, which is very low as well. Based on the data anal-ysis, we could improve the performance of CareCall (seeTable 1). Remarkable improvement of performance fromApril results from NLU and speech synthesis model updateand data refinement by our human-in-the-loop process.CareCall asks polar questions to monitored subjects, andthey need to answer simply ‘yes’ or ‘no’ to the questions.Most of the monitored subjects could easily interact withthe voice agent of CareCall. However, since older peopletended to respond more freely, it was difficult for the dialogsystem to classify the utterances of older people. This is achallenging technology issue we need to tackle. Firstly, avoice-based dialog system is required to be able to under-stand unexpected type of user utterances. Therefore NLUmodule could be crucial in this voice-based interface. Fur-thermore, handling utterances of older people could be chal-lenging because they easily expressed their emotion againstthe system although the voice agent of CareCall was nota real human (Leviathan & Matias, 2018). However, olderpeople is an important group in COVID-19 monitoring. Inthe statistics in Korea until early June, over-60s accounts formore than 23% of confirmed cases and 92% of death cases.Therefore, it is crucial to monitor older people and it shouldnot be excluded from the investigation as an exceptionalcase.
5. Discussion
We discuss an idea on how to measure spreading of thepandemic based on our system in terms of simple prob-abilistic modeling. It is important to prevent communityspread by tracking, and some countries are successfully
Figure 3.
An example of PGM for modeling a infection rate q andan individual infection z n preventing a serious situation. In the situation where localcommunity spread is suspected, tracking and estimating thespread status makes some advantageous for deciding to ex-tend self-quarantine policy or execute other helpful actions.Specifically, a statistically significant infection rate and thenumber of estimated infected patients would be helpful. Toevaluate the individual infection, a previous work uses linearregression with the person’s symptoms as a feature (Menniet al., 2020). However, the local community’s infection rateis also a critical factor in evaluating individual infection. Tothis end, our idea is to ask the symptoms to not only self-container but also other random people in the community toestimate how severe the spread in the community is.To aggregate the reported symptoms from callees, aBayesian approach can be used for modeling spread degreesin a local community. The Bayesian approach utilizes theinformation from statistics of previous COVID-19 spread asthe prior to estimate the posterior probability of communityinfection rate by using the symptoms of people investigated.Figure 3 presents a simple example of Bayesian modeling. T is defined as binary value where 1 denotes infection caseexists, whereas 0 denotes it does not exist. ≤ q ≤ isa continuous value, which denote infection probability. q and T are modeled as separated random variables becausewe want to model p ( q = 0) like delta-function. Otherwise, p ( q =0) p ( q> = 0 . Prior of T and q (i.e., p ( T ) and p ( q | T ) ) canbe characterized by the statistics of previous COVID-19spread.Individual infection z n is a binary value, which denoteswhether individual n is infected or not. A confirmed casecan be considered as z n = 1 . f n,v is the n -th individ-ual’s feature v . For example, a feature can be loss ofsmell and taste. F = { F , · · · , F n , · · · , F N } , and F n = { f n, , · · · , f n,v , · · · , f n,V } . We can also define p ( z n | q ) = q and p ( F n | z n ) = (cid:81) i p ( f n,i | z n ) . In this formulation, calcu-lating the posterior of infection rate p ( q | F ) or an individualinfection p ( z n | F ) would be one of primary interests. areCall: a Call-Based Active Monitoring Dialog Agent for Managing COVID-19 Pandemic Extending our CareCall system to more general healthcaredomains is natural because the system flow is not specificto COVID-19 only. First, our system can be extended tomaking a call-based examination for cold or flu withoutheavy additional efforts. By the examination results, oursystem can recommend whether users should go to see adoctor or which department they need to go to. Furthermore,we can also extend this system to cover more diseases bycustomizing the graph definition of dialog state and actioncorresponding to intents, slots, human-to-human dialog ut-terances, and disease-question-answering data. In particular,CareCall might be effective for managing chronic diseasessuch as diabetes, hypertension, and hyperlipidemia whichmany elderly people not familiar with mobile applicationssuffer from. Our system can help the elderly patients bymonitoring the status of patients and recommending theactions corresponding to the patient state.
6. Conclusion
We introduced CareCall, a call-based AI dialog agent sys-tem for monitoring people who contacted injected patients,and showed how this system is built and works robustly. Wealso discussed the idea of applying our system to measurethe spreading of the pandemic in the local community withsimple probabilistic modeling. We hope this kind of investi-gation actively plays a role in tracking and preventing theCOVID-19 spread.
Acknowledgements
The authors thank all members of Clova CIC for supportingthis work. Also, the authors appreciate all the medical staffsaround the world for their devoted efforts to prevent COVID-19.
References
Adiwardana, D., Luong, M.-T., So, D. R., Hall, J., Fiedel,N., Thoppilan, R., Yang, Z., Kulshreshtha, A., Nemade,G., Lu, Y., et al. Towards a human-like open-domainchatbot. arXiv preprint arXiv:2001.09977 , 2020.Anumanchipalli, G. K., Chartier, J., and Chang, E. F. Speechsynthesis from neural decoding of spoken sentences.
Na-ture , 568(7753):493–498, 2019.Chan, W., Jaitly, N., Le, Q., and Vinyals, O. Listen, attendand spell: A neural network for large vocabulary conver-sational speech recognition. In , pp. 4960–4964. IEEE, 2016. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert:Pre-training of deep bidirectional transformers for lan-guage understanding. In
Proceedings of the 2019 Confer-ence of the North American Chapter of the Association forComputational Linguistics: Human Language Technolo-gies, Volume 1 (Long and Short Papers) , pp. 4171–4186,2019.Dugas, A. F., Hsieh, Y.-H., Levin, S. R., Pines, J. M.,Mareiniss, D. P., Mohareb, A., Gaydos, C. A., Perl, T. M.,and Rothman, R. E. Google flu trends: correlation withemergency department influenza rates and crowding met-rics.
Clinical infectious diseases , 54(4):463–469, 2012.Gane, S. B., Kelly, C., and Hopkins, C. Isolated suddenonset anosmia in covid-19 infection. a novel syndrome.
Rhinology , 10, 2020.Greenberg, N., Docherty, M., Gnanapragasam, S., and Wes-sely, S. Managing mental health challenges faced byhealthcare workers during covid-19 pandemic. bmj , 368,2020.Ha, J.-W., Nam, K., Kang, J. G., Lee, S.-W., Yang, S., Jung,H., Kim, E., Kim, H., Kim, S., Kim, H. A., et al. Clovacall:Korean goal-oriented dialog speech corpus for automaticspeech recognition of contact centers. arXiv preprintarXiv:2004.09367 , 2020.Helft, M. Google uses searches to track flu’sspread.
The New York Times , 2008. URL .Hendrycks, D. and Gimpel, K. A baseline for detectingmisclassified and out-of-distribution examples in neuralnetworks. arXiv preprint arXiv:1610.02136 , 2016.Leviathan, Y. and Matias, Y. Google duplex: An ai sys-tem for accomplishing real-world tasks over the phone.Technical report, 2018.Menni, C., Valdes, A. M., Freidin, M. B., Sudre, C. H.,Nguyen, L. H., Drew, D. A., Ganesh, S., Varsavsky, T.,Cardoso, M. J., Moustafa, J. S. E.-S., et al. Real-timetracking of self-reported symptoms to predict potentialcovid-19.
Nature medicine , pp. 1–4, 2020.Nassif, A. B., Shahin, I., Attili, I., Azzeh, M., and Shaalan,K. Speech recognition using deep neural networks: Asystematic review.
IEEE Access , 7:19143–19165, 2019.Organization, W. H. et al. Critical preparedness, readinessand response actions for covid-19: interim guidance, 22march 2020. Technical report, World Health Organization,2020. areCall: a Call-Based Active Monitoring Dialog Agent for Managing COVID-19 Pandemic
Peak, C. M., Kahn, R., Grad, Y. H., Childs, L. M., Li, R.,Lipsitch, M., and Buckee, C. O. Individual quarantineversus active monitoring of contacts for the mitigationof covid-19: a modelling study.
The Lancet InfectiousDiseases , 2020.Shah, P., Hakkani-T¨ur, D., T¨ur, G., Rastogi, A., Bapna,A., Nayak, N., and Heck, L. Building a conversationalagent overnight with dialogue self-play. arXiv preprintarXiv:1801.04871 , 2018.Shim, E., Tariq, A., Choi, W., Lee, Y., and Chowell, G.Transmission potential and severity of covid-19 in southkorea.
International Journal of Infectious Diseases , 2020.Siddhant, A. and Lipton, Z. C. Deep bayesian active learningfor natural language processing: Results of a large-scaleempirical study. arXiv preprint arXiv:1808.05697 , 2018.Song, E., Byun, K., and Kang, H.-G. Excitnet vocoder:A neural excitation model for parametric speech synthe-sis systems. In , pp. 1–5. IEEE, 2019.Sungnak, W., Huang, N., B´ecavin, C., Berg, M., Queen, R.,Litvinukova, M., Talavera-L´opez, C., Maatz, H., Reichart,D., Sampaziotis, F., et al. Sars-cov-2 entry factors arehighly expressed in nasal epithelial cells together withinnate immune genes.
Nature medicine , 26(5):681–687,2020.Weissleder, R., Lee, H., Ko, J., and Pittet, M. J. Covid-19diagnostics in context.