MeSH descriptors indicate the knowledge growth in the SARS-CoV-2/COVID-19 pandemic
MMeSH descriptors indicate the knowledge growth in theSARS-CoV-2/COVID-19 pandemic.
Johannes Stegmann ∗†‡
Abstract
The scientific papers dealing with the novel be-tacoronavirus SARS-CoV-2 and the coronavirusdisease 2019 (COVID-19) caused by this virus,published in 2020 and recorded in the databasePUBMED, were retrieved on April 27, 2020. About20% of the records contain Medical Subject Headings(MeSH), keywords assigned to records in the courseof the indexing process in order to summarise thearticles’ contents. The temporal sequence of the firstoccurrences of the keywords was determined, thusgiving insight into the growth of the knowledge baseof the pandemic.
Keywords : SARS-CoV-2, COVID-19, PUBMED,Medical Subject Headings.
The rapid worldwide spread of the new epidemicCOVID-19, caused by the virus SARS-CoV-2, withnow more than 3.4 million confirmed cases of the dis-ease and a confirmed death rate of almost 7% (WorldHealth Organization, 2020) requires fast and compre-hensive efforts of states and societies to combat thedisease effectively by means of practical and appro-priate medical, administrative and economic actions.Moreover, the scientific community has the responsi-bility to bundle resources and manpower to develop ∗ Member of the Ernst-Reuter-Gesellschaft der Freunde,F¨orderer und Ehemaligen der Freien Universit¨at Berlin e.V. † Former (now retired) employee of the Medical Library ofthe Free University Berlin and the Charit´e Berlin. ‡ Radebeul, Germany, [email protected] tests, drugs and vaccines in order to gain control overthe virus and the disease as quick as possible.An important research tool is immediate and un-limited access to the scientific literature. For thebiomedical specialties, the freely available databasePUBMED/MEDLINE ∗ is indispensable for a com-prehensive retrieval of the published scientific paperson biomedical research questions. Besides the bibli-ographic metadata (as author name(s), publicationyear, journal name and volume, etc.) PUBMEDrecords are indexed by a controlled vocabulary ofmany thousands descriptors, the Medical SubjectHeadings (MeSH † ). In addition to the words con-tained in titles and abstracts of indexed papers, theMeSH descriptors assigned to PUBMED records aresignificant for a thorough analysis of the papers’ con-tent.In the study presented here the publications onSARS-CoV-2 and COVID-19 were retrieved anddownloaded from PUBMED. The MeSH descriptorswere extracted from the records already annotatedwith MeSH. The keywords were ordered chronologi-cally according to the publication date of their asso-ciated papers and their first occurences were deter-mined. ∗ † a r X i v : . [ c s . D L ] M a y able 1: SARS-CoV-2/COVID-19 papers without and with MeSH terms retrieved * from PUBMED.* Date of retrieval: April 27, 2020All papers Papers with MeSH terms Number of distinct MeSH terms7366 1504 1769
Papers published 2020 were retrieved and down-loaded from PUBMED on April 27, 2020, using thefollowing search profile: new coronavirus* OR novel coronavirus* OR ncovOR sars-cov OR covid* OR cov-2 OR cov-19 (thetruncation asterisk - ”*” - retrieves all terms withthat word stem).
The Medical Subject Headings (MeSH) assignedto PUBMED records are contained in the MH fields.Controlled vocabulary is also contained in the RNfields. The contents of both fields were extractedfrom the records. In addition, the unique record num-bers and the database indexing date were extractedfrom the PMID and MHDA fields, respectively.
Extraction of record field contents, clustering, dataanalysis, calculations and visualisation were done us-ing homemade programs and scripts for perl (version5.26.1) and the software package R version 3.4.4 (RCore Team, 2018). All operations were done on acommercial PC run under Ubuntu version 18.04 LTS.
The search profile mentioned in the Methods para-graph retrieved 7366 publications for the period Jan-uary to April 27, 2020 (Table 1). The daily distribu-tion of the items is shown in Figure 1. In Januaryand February 2020 few papers appeared, followed by days nu m be r o f pape r s January February March April c u m u l a t ed nu m be r o f pape r s Figure 1: Number of SARS-CoV-2/COVID-19 pa-pers indexed daily * in PUBMED.* January to April-27, 2020. Days without papers are omitted.dashed line: cumulated number of papers. ays nu m be r o f pape r s w i t h M e S H t e r m s February March April
Figure 2: Daily * number of SARS-CoV-2/COVID-19papers indexed with MeSH terms in PUBMED.* February to April-27, 2020. Days without papers are omitted. days c u m u l a t ed nu m be r s February March AprilMeSH papers
Figure 3: Cumulated daily * number of SARS-CoV-2/COVID-19 papers with assigned MeSH terms andtheir distinct new MeSH terms.* February to April-27, 2020.
Tables 2 and 3 show the addition of (new) MeSHterms to indexed papers by day. Table 2 shows thenumbers, Table 3 examples of the terms. Table 2lists the dates and the number of publications withMeSH indexed as well as the numbers of new MeSHterms, i.e. MeSH terms which are not contained inthe papers of the preceding dates. In Table 3 se-lected Medical Subject Headings are listed accordingto the sequence of their appearance from February toApril 2020. The MeSH terms assigned to papers inthe first half of February 2020 indicate the knowledgeof a disease outbreak in China of pandemic propor-tions, caused by a betacoronavirus, and both, diseaseand virus, are already labelled (Table 3). The con-comitant - possibly life-threatening - implications ofthe new disease, disease-spreading mechanisms, nec-essary diagnostic tools, assessment of especially vul-nerable age groups, problems of health care systems,possible drug therapy schemes and other therapy ap-proaches become evident using the information con-tained in MeSH terms assigned to papers publishedin subsequent days, weeks and months. Although thefraction of papers with assigned MeSH terms is rela-tively low (see Table 1), may the whole set of alreadymore than 1700 MeSH terms (at the download date,see e.g. Table 2) greatly benefit (not only) medicalexperts.
The short analysis of SARS-CoV-2/COVID-19publications presented here shows that careful in-spection of the assigned Medical Subject Headingsis worthwhile and associated with an increase of theknowledge base of the pandemic.
References
Kousha, K., Thelwall, M. (2020): COVID-19 publications: Database coverage, ci-tations, readers, tweets, news, Facebookwalls, Reddit posts. arXiv:2004.10400.
URLhttp://arxiv.org/abs/2004.10400.R Core Team (2018):
R: A language and en-vironment for statistical computing arXiv:2004.06721.
URLhttp://arxiv.org/abs/2004.06721. (also pub-lished as: Ritmo de crecimiento diariode la producci´on cientfica sobre Covid-19. An´alisis en bases de datos y repos-itorios en acceso abierto.
El profesionalde la informaci´on, v. 29, n. 2, e290215. https://doi.org/10.3145/epi.2020.mar.15)World Health Organization (May 4, 2020):
WHO Coronavirus Disease (COVID-19) Dash-board.
URL https://covid19.who.int/.4able 2: Temporal increase of distinct MeSH terms assigned to SARS-CoV-2/COVID-19 papers.Date Papers with MeSH New MeSH terms Date Papers with MeSH New MeSH termsdd mm yyyy number cumulated number cumulated dd mm yyyy number cumulated number cumulated06 02 2020 1 1 3 3 21 03 2020 46 313 54 60108 02 2020 1 2 16 19 24 03 2020 49 362 66 66711 02 2020 1 3 3 22 25 03 2020 23 385 36 70314 02 2020 1 4 4 26 27 03 2020 22 407 39 74218 02 2020 1 5 3 29 28 03 2020 45 452 57 79919 02 2020 1 6 7 36 28 03 2020 1 453 1 80020 02 2020 9 15 29 65 31 03 2020 4 457 5 80523 02 2020 1 16 1 66 01 04 2020 5 462 5 81025 02 2020 1 17 3 69 02 04 2020 34 496 50 86026 02 2020 1 18 1 70 03 04 2020 31 527 71 93127 02 2020 1 19 0 70 04 04 2020 53 580 70 100129 02 2020 1 20 5 75 09 04 2020 132 712 134 113503 03 2020 2 22 7 82 10 04 2020 50 762 22 115704 03 2020 1 23 2 84 11 04 2020 93 855 79 123607 03 2020 8 31 18 102 14 04 2020 56 911 49 128510 03 2020 3 34 6 108 15 04 2020 69 980 60 134511 03 2020 3 37 11 119 16 04 2020 58 1038 61 140613 03 2020 4 41 9 128 17 04 2020 54 1092 56 146214 03 2020 5 46 12 140 18 04 2020 53 1145 43 150517 03 2020 60 106 137 277 21 04 2020 59 1204 37 154218 03 2020 28 134 53 330 22 04 2020 66 1270 44 158619 03 2020 96 230 155 485 23 04 2020 44 1314 37 162320 03 2020 37 267 62 547 24 04 2020 108 1422 76 169925 04 2020 82 1504 70 17695able 3: SARS-CoV-2/COVID-19 papers 2020: Temporal sequence of MeSH terms appearance (examples).