Ontology-based annotation and analysis of COVID-19 phenotypes
OOntology-based annotation and analysis ofCOVID-19 phenotypes
Yang Wang a,b , Fengwei Zhang a , Hong Yu a,b , Xianwei Ye a,b , Yongqun He c,1a Guizhou University Medical College, Guiyang, Guizhou 550025, China. b Department of Pulmonary and Critical Care Medicine, Guizhou Provincial People’sHospital and NHC Key Laboratory of Immunological Diseases, People’s Hospital ofGuizhou University, Guiyang, Guizhou 550002, China. c University of Michigan Medical School, Ann Arbor, MI 48109, USA . Abstract.
The epidemic of COVID-19 has caused an unpredictable and devastateddisaster to the public health in different territories around the world. Commonphenotypes include fever, cough, shortness of breath, and chills. With more casesinvestigated, other clinical phenotypes are gradually recognized, for example, lossof smell, and loss of tastes. Compared with discharged or cured patients, severe ordied patients often have one or more comorbidities, such as hypertension, diabetes,and cardiovascular disease. In this study, we systematically collected and analyzedCOVID-19-related clinical phenotypes from 70 articles. The commonly occurring17 phenotypes were classified into different groups based on the HumanPhenotype Ontology (HPO). Based on the HP classification, we systematicallyanalyze three nervous phenotypes (loss of smell, loss of taste, and headache) andfour abdominal phenotypes (nausea, vomiting, abdominal pain, and diarrhea)identified in patients, and found that patients from Europe and USA turned to havehigher nervous phenotypes and abdominal phenotypes than patients from Asia. Atotal of 23 comorbidities were found to commonly exist among COVID-19patients. Patients with these comorbidities such as diabetes and kidney failure hadworse outcomes compared with those without these comorbidities.
Keywords.
COVID-19, phenotype, ontology, comorbidity, CIDO, HPO.
1. Introduction
The COVID-19 pandemic had resulted in a major loss worldwide. In the US alone,the pandemic had caused over 1.6 million confirmed cases and near 10 thousand deathsby May 25, 2020. The cause of the disease is SARS-CoV-2, a single-positive RNAvirus. It belongs to the order
Nidovirales , family
Coronaviridae , and subfamily Yongqun Oliver He, Corresponding author, University of Michigan Medical School, Ann Arbor, MI,USA. E-mail: [email protected]. oronavirinae [4]. According to the literature and newspapers reporting, the clinicalsymptoms in different areas are overall stable but also show differences, which weoften ignored. The common symptoms include fever, cough, shortness of breath ordifficulty breathing, and chills [12]. However, physicians and pathologists haverecognized other clinical phenotypes, for example, loss of smell, and loss of tastes. Inmany cases, the symptoms of the digestive system become the primary phenotypes [12].The early recognition of these symptoms facilitates early diagnosis and treatment,which would bring immeasurable good outcomes to them.Compared to discharged or cured patients of COVID-19, severe or died patientsare more likely to have comorbidities. Comorbidity here refers to the simultaneouspresence of some disease(s) or condition(s) in a patient when COVID-19 occurs.Hypertension, diabetes, cardiovascular diseases are a few examples of the mostcommon comorbidities associated with COVID-19 [1]. However, a systematicinvestigation and classification of the relations between the comorbidities and diseaseoutcomes have not been reported.Ontology has played a significant role in standard data representation,classification, integration, and analysis. Many ontologies have been widely used in thefield of medicine and pharmacy in recent years. The Human Phenotype Ontology (HPO)is a standardized vocabulary for phenotypic abnormalities in human disease. Each termin the HPO describes a phenotypic abnormality, such as Pneumonia. HPO currentlycontains over 13,000 terms [6]. It bridges a computational link between genomebiology and clinical medicine.In this study, we systematically annotated COVID-19 clinical phenotypes andcomorbidities from journal and preprint articles, and applied HP to classify thesephenotypes and examine the internal patterns. Our study also identified many sharedand differential phenotype patterns in COVID-19 patients from countries in differentregions in the world.
2. Methods
Peer-reviewed journal articles and preprint bioRxiv and medRxiv articles weresearched to identify relevant articles.
The identified symptoms and comorbidities were mapped to the terms in the HumanPhenotype ontology (HPO). Ontobee [13] was used to search the HP terms. Ontofox[16] was used to extract the specific sets of the phenotypes. The Protégé-OWL editorwas used for display, classification, and analysis.
The mapped and extracted HP terms and subsets were imported to the CoronavirusInfectious Disease Ontology (CIDO) (http://github.com/CIDO-ontology/cido). . Results
We reviewed 70 papers from December 2019 to date in Pubmed and preprint bioRxivand medRxiv. The patients reported in these papers include countries in Asia (e.g.,China, South Korea, Singapore, Japan), Middle East(e.g., Iran), Europe (e.g., Italy,France, Spain, German, Belgium, Switzerland), and North America (the USA). Tenrepresentative articles are provided in Table 1.
Table 1.
Ten representative articles annotated in this study
Report time Country Cases Reference (PMID or DOI)
Feb 19 ,2020 China 140 PMID: 32077115Feb 28 ,2020 China 1099 PMID: 32109013Mar 27, 2020 Iran 10069 Doi:https://doi.org/10.1101/2020.03.23.20041889Apr 17, 2020 France 114 PMID: 32305563Apr 22, 2020 America 5700 PMID: 32320003Apr 22, 2020 Italy 374 PMID: 32320008Apr 24, 2020 America 169 PMID: 32329222Apr 24, 2020 America 1299 PMID: 32329797Apr 28, 2020 Britain 16749 Doi:https://doi.org/10.1101/2020.04.23.20076042Apr 30, 2020 France 、 Italy 、 Spain 、 Belgium 、 Switzerland 1420 PMID: 32352202
Based on our literature search, we found a large number of COVID-19 case reports andanalyses from December 2019 to date in different regions all around the world. A totalof 17 common clinical symptoms were found, including fever, cough, shortness ofbreath or difficulty breathing, chills, repeated shaking with chills, muscle pain,headache, sore throat, new loss of smell or taste etc. HP was used to classify these 17common symptoms (Figure 1). Overall, these symptoms are located in the abdominalsystem, nervous system, head, respiratory system, constitutional system, and blood.The nervous system abnormality includes parageusia (loss of taste, HP:0031249),anosmia (loss of smell, HP:0000458), and headache (HP:0002315). The abnormality ofthe head includes abnormality of face and pharynx. Parageusia and anosmia areabnormality of face phenotypes. Pharingitis belongs to the abnormality of nasopharynxand pharynx. igure 1. Common phenotypes of COVID-19 based on HP classification.
Instead of focusing on individual phenotypes, we hypothesized that the analysis ofphenotypes as groups would identify new scientific insights. First, we analyzed thegroup of COVID-19-related nervous system phenotypes, which includes loss of smell(anosmia), loss of taste (parageusia), and headache (Figure 1). The combined analysisof all these three phenotypes as a whole provided us a unique angle to study how thedisease affects the nervous system.As shown in Figure 2, all the three nervous symptom phenotypes in Asian patientsappeared low. Ten groups of Chinese patients were analyzed. Among the threephenotypes, headache was relatively common in Chinese patients. However, exceptone group reporting low level of hyposmia and hypogeusia [11], the other 9 groups didnot report any cases of loss of smell and loss of taste. South Korean and Japanese alsoshow the consistent pattern of lower incidence of loss of smell and loss of taste inFigure2.In contrast, in Iran, Italy, France, German, Spain, USA, there were higherproportions of cases with the loss of smell or tastes. Especially in Europe, there weretwo large investigation questionnaires [9] about loss of smell and taste in confirmedpatients, including many doctors and nurses infected in hospital. According to themulticenter study, the loss of smell was a key symptom in mild-to-moderate COVID-19 patients, the loss of smell was also not associated with nasal obstruction andrhinorrhea. Females and young patients were more susceptible to having the symptomsof smell and taste loss, whereas elderly individuals often presented fever, fatigue andloss of appetite. An obvious correlation between smell and taste disorders was alsoidentified [9]. igure 2. Cases with three nervous system phenotypes in different countries.
Three symptoms (i.e.,headache, loss of smell, and loss of taste) were analyzed. Each symptom has 0-100% of occurrence.
Many mild-moderate COVID-19 patients had gastrointestinal disorders as primarysymptoms. The primary abdominal phenotypes include nausea, vomiting, abdominalpain, and diarrhea. We analyzed these four abdominal phenotypes together andcompared the cases from different countries and regions.A retrospective case-control study in New York found that those patients withgastrointestinal symptoms (defined as diarrhea or nausea/vomiting) were significantlymore likely to be tested COVID-19 positive than negative (61% vs. 39%, p=0.04) [12].In 393 patients with COVID-19 in two hospitals in New York, the diseasemanifestations were in general similar to those in a large case series from China [7];however, gastrointestinal symptoms appeared to be more common in New York than inChina (where these symptoms occurred in 4 to 5% of patients) [5]. We also found thatdigestive symptoms, especially diarrhea, occurred in almost all countries (Figure 3).However, the accumulated percentages of digestive symptoms were significantlyhigher in UK, France, New York, California. The time also appears to be a factor. Moredigestive symptoms were showed in middle-late of March and early April. igure 3. Cases with four abdominal system phenotypes in different countries.
Four symptoms (i.e.,nausea, vomiting, abdominal pain, and diarrhea) were analyzed. Each symptom has 0-100% of occurrence.
Many factors, such as age, gender, smoking status, have been found to significantlyaffect disease outcomes. For example, among reported COVID-19 cases, the older menhave a higher mortality rate [2]. Comorbidity (i.e., existing medical conditions whenpatient is infected) is another important risk factor for outcomes. Hypertension,diabetes, cardiovascular diseases, chronic pulmonary diseases are the most commoncomorbidities. Other complication comorbidities include chronic kidney, hepatitis B/Cinfection, chronic hepatic failure, cirrhosis, chronic neurological disease (e.g., seizuresand dementia), and haematological system disease (e.g., abnormality of blood andblood-forming tissues) [5].We identified 23 common comorbidities and used HP to classify thesecomorbidities (Figure 4). Such ontology classification of comorbidities allows us toidentify different groups of comorbidities and their hierarchical relations. Specifically,these comorbidities occur in various systems such as the cardiovascular system, blood,immune system, metabolism, digestive system, nervous system, kidney, respiratorysystem. Cirrhosis, viral hepatitis, and chronic hepatic failure belong to the digestivesystem. Dementia, seizure is the subclass of abnormal nervous system. Obesity andautoimmune deficiency status are also important risk factors for poor prognosis (Figure4). igure 4. Hierarchical lay of 23 common comorbidity phenotypes based on HP classification.
To further study the relation between different comorbidity phenotypes and diseaseoutcomes, we survey the disease data from the literature and compared the incidencesof specific comorbidity phenotypes in severe or non-severe COVID-19 patients. Fromthe long list of comorbidity phenotypes (Figure 4), we chose diabetes and kidneydisease for our further analysis (Figure 5). The results from a total of 7 papers wereapplied for the study.As shown in Figure 5, the morbidity of severe patients with diabetes or kidneyfailure phenotype was generally higher than that in non-severe patients. In all regionsexcept California, the morbidity of kidney disease was higher in severe patients than innon-severe patients. In all records of COVID-19 patients, whether severe or non-severe,the incidence of diabetes was significantly higher than renal disease in COVID-19patients. The incidence of diabetes was significantly higher in severe patients than thatin non-severe patients (Figure 5). It was reported that cytokine storm might beactivated in diabetic patients, leading to poor prognosis and death [3]. The exactmechanism deserves further investigation. igure 5. Correlation between two comorbidity phenotypes (diabetes and kidney failure) and twodisease outcomes (severe or non-severe).
In severe disease patients, the incidence of diabetes or kidneyfailure was higher than that in non-severe patient groups (The X-axis is country/city, report date, number ofcases).
The Coronavirus Infectious Disease Ontology (CIDO; https://github.com/CIDO-ontology/cido) is a community-based ontology in the domain of coronavirus diseaseswith a specific focus on COVID-19. CIDO covers different coronavirus disease-relatedtopics including coronaviruses, natural hosts of coronaviruses, phenotypes,comorbidities, drugs, and vaccines. To date, CIDO has more than 5,500 terms. In termsof COVID-19 phenotype classification, CIDO has imported the HP representations ofCOVID-19-related phenotypes (Figure 1) and comorbidities (Figure 4).Different viruses have the disposition of inducing specific phenotypes in thepatients. Currently study focuses on analyzing the COVID-19 phenotypes induced bySARS-CoV-2. CIDO also covers other coronaviruses such as SARS-CoV and MERS-CoV. To differentiate the relations between different viruses and phenotypes, we use arelation called ‘ pathogen susceptible to induction of phenotype ’ defined in theOntology of Host-Pathogen Interactions (OHPI) [14]. With this relation, we can definethe virus-phenotype relation such as the following:
SARS-CoV-2: ‘pathogen susceptible to induction of phenotype’ some headache
4. Discussion he worldwide epidemic of COVID-19 has caused serious damage and posed a seriousthreat to people's health and lives. Many specific drugs are still in clinical trials andvaccines are being developed in various countries. With the outbreak of COVID-19,many different clinical phenotypes are occurring in patients, so we want to map thesephenotypes and comorbidities, to help doctors and CDC scientists better to know theclinical profiles of COVID-19.Ontologies such as HPO and CIDO provides computer-interpretablebioinformatics resources for the analysis of phenotypes and underlying causes. HPOprovides not only a standard phenotype terminology but also a collection of disease-phenotype annotations. CIDO reuses HPO and focuses on the identification of thecausal relations between phenotypes and coronaviruses. These ontologies, togetherwith ontology-based software programs and computational algorithms, can becombined to analyze the large amounts of data including clinical cases and basicexperimental data with an aim to fully understand the internal mechanisms underdifferent phenotypes of COVID-19 and their relations with various genetic mutations inthe viruses, and to support rational development of therapeutic and preventivemeasures against the pathological infections.Based on the HP classification, we systematically analyzed 17 clinical phenotypesof COVID-19 in case reported. We focused on three nervous phenotypes (loss of smell,loss of taste, and headache) and four abdominal phenotypes (nausea, vomiting,abdominal pain, and diarrhea) identified in patients, and found that patients fromEurope and USA turned to have higher nervous phenotypes and abdominal phenotypesthan patients from Asia. A total of 23 comorbidities were found to commonly existamong COVID-19 patients, usually COVID-19 patients with comorbidities had worseoutcomes. From the study, patients with these comorbidities such as diabetes andkidney failure had worse outcomes compared with those without these comorbidities.Whether these results are related to the distribution of mutated virus strains in differentregions and populations will be our next research direction. We will also investigatewhether and how other conditions (such as temperature and season) are risk factors tothe disease and infections.Recently, children infected with COVID-19 presented kawasaki-like diseasemanifestations and systemic inflammatory responses syndrome in the United States andEurope, which led to critical illness and even deaths [10; 15]. And other patientsinfected with covid-19 have skin lesions as the main symptom, especially in toes,which is known as COVID-19 toes [8]. These new occurring phenotypes deserve ourkeen attention and require further careful monitoring and analysis.
References [1] C. Chen, C. Chen, J.T. Yan, N. Zhou, J.P. Zhao, and D.W. Wang, [Analysis of myocardial injuryin patients with COVID-19 and association between concomitant cardiovascular diseases and severity ofCOVID-19],
Zhonghua Xin Xue Guan Bing Za Zhi (2020), E008.[2] N. Chen, M. Zhou, X. Dong, J. Qu, F. Gong, Y. Han, Y. Qiu, J. Wang, Y. Liu, Y. Wei, J. Xia, T.Yu, X. Zhang, and L. Zhang, Epidemiological and clinical characteristics of 99 cases of 2019 novelcoronavirus pneumonia in Wuhan, China: a descriptive study, Lancet (2020), 507-513.[3] S.A. Cole, H.A. Laviada-Molina, J.M. Serres-Perales, E. Rodriguez-Ayala, and R.A. Bastarrachea,The COVID-19 Pandemic during the Time of the Diabetes Pandemic: Likely Fraternal Twins?,
Pathogens (2020).4] V.M. Corman, D. Muth, D. Niemeyer, and C. Drosten, Hosts and Sources of Endemic HumanCoronaviruses, Adv Virus Res (2018), 163-188.[5] P. Goyal, J.J. Choi, L.C. Pinheiro, E.J. Schenck, R. Chen, A. Jabri, M.J. Satlin, T.R. Campion, Jr.,M. Nahid, J.B. Ringel, K.L. Hoffman, M.N. Alshak, H.A. Li, G.T. Wehmeyer, M. Rajan, E. Reshetnyak, N.Hupert, E.M. Horn, F.J. Martinez, R.M. Gulick, and M.M. Safford, Clinical Characteristics of Covid-19 inNew York City,
N Engl J Med (2020).[6] T. Groza, S. Kohler, D. Moldenhauer, N. Vasilevsky, G. Baynam, T. Zemojtel, L.M. Schriml,W.A. Kibbe, P.N. Schofield, T. Beck, D. Vasant, A.J. Brookes, A. Zankl, N.L. Washington, C.J. Mungall,S.E. Lewis, M.A. Haendel, H. Parkinson, and P.N. Robinson, The Human Phenotype Ontology: SemanticUnification of Common and Rare Disease,
Am J Hum Genet (2015), 111-124.[7] W.J. Guan, Z.Y. Ni, Y. Hu, W.H. Liang, C.Q. Ou, J.X. He, L. Liu, H. Shan, C.L. Lei, D.S.C. Hui,B. Du, L.J. Li, G. Zeng, K.Y. Yuen, R.C. Chen, C.L. Tang, T. Wang, P.Y. Chen, J. Xiang, S.Y. Li, J.L.Wang, Z.J. Liang, Y.X. Peng, L. Wei, Y. Liu, Y.H. Hu, P. Peng, J.M. Wang, J.Y. Liu, Z. Chen, G. Li, Z.J.Zheng, S.Q. Qiu, J. Luo, C.J. Ye, S.Y. Zhu, N.S. Zhong, and C. China Medical Treatment Expert Group for,Clinical Characteristics of Coronavirus Disease 2019 in China, N Engl J Med (2020), 1708-1720.[8] N. Landa, M. Mendieta-Eckert, P. Fonda-Pascual, and T. Aguirre, Chilblain-like lesions on feetand hands during the COVID-19 Pandemic,
Int J Dermatol (2020), 739-743.[9] J.R. Lechien, C.M. Chiesa-Estomba, D.R. De Siati, M. Horoi, S.D. Le Bon, A. Rodriguez, D.Dequanter, S. Blecic, F. El Afia, L. Distinguin, Y. Chekkoury-Idrissi, S. Hans, I.L. Delgado, C. Calvo-Henriquez, P. Lavigne, C. Falanga, M.R. Barillari, G. Cammaroto, M. Khalife, P. Leich, C. Souchay, C.Rossi, F. Journe, J. Hsieh, M. Edjlali, R. Carlier, L. Ris, A. Lovato, C. De Filippis, F. Coppee, N. Fakhry, T.Ayad, and S. Saussez, Olfactory and gustatory dysfunctions as a clinical presentation of mild-to-moderateforms of the coronavirus disease (COVID-19): a multicenter European study, Eur Arch Otorhinolaryngol (2020).[10] A. Morand, D. Urbina, and A. Fabre, COVID-19 and Kawasaki Like Disease: The Known-Known, the Unknown-Known and the Unknown-Unknown., (2020), doi:10.20944/preprints202005.200160.v202001.[11] E.J. Needham, S.H. Chou, A.J. Coles, and D.K. Menon, Neurological Implications of COVID-19Infections,
Neurocrit Care (2020).[12] Y.R. Nobel, M. Phipps, J. Zucker, B. Lebwohl, T.C. Wang, M.E. Sobieszczyk, and D.E.Freedberg, Gastrointestinal Symptoms and COVID-19: Case-Control Study from the United States,
Gastroenterology (2020).[13] E. Ong, Z. Xiang, B. Zhao, Y. Liu, Y. Lin, J. Zheng, C. Mungall, M. Courtot, A. Ruttenberg, andY. He, Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query andintegration,
Nucleic Acids Res (2017), D347-D352.[14] S. Sayers, L. Li, E. Ong, S. Deng, G. Fu, Y. Lin, B. Yang, S. Zhang, Z. Fa, B. Zhao, Z. Xiang, Y.Li, X.M. Zhao, M.A. Olszewski, L. Chen, and Y. He, Victors: a web-based knowledge base of virulencefactors in human and animal pathogens, Nucleic Acids Res (2019), D693-D700.[15] L. Verdoni, A. Mazza, A. Gervasoni, L. Martelli, M. Ruggeri, M. Ciuffreda, E. Bonanomi, and L.D'Antiga, An outbreak of severe Kawasaki-like disease at the Italian epicentre of the SARS-CoV-2 epidemic:an observational cohort study, Lancet (2020).[16] Z. Xiang, M. Courtot, R.R. Brinkman, A. Ruttenberg, and Y. He, OntoFox: web-based support forontology reuse,
BMC Res Notes3:175