Getting Insights from a Large Corpus of Scientific Papers on Specialisted Comprehensive Topics -- the Case of COVID-19
GGetting Insights from a Large Corpus of Scientific Papers onSpecialisted Comprehensive Topics - the Case of COVID-19
Bernard Dousset a , Josiane Mothe ba IRIT, UMR5505, CNRS & Univ. Toulouse, France b IRIT, UMR5505, CNRS & INSPEE UT2J, Univ. Toulouse, [email protected]
ABSTRACT
COVID-19 is one of the most important topic these days, specifically on search engines and news. Whilefake news are easily shared, scientific papers are reliable sources where information can be extracted.With about 24,000 scientific publications on COVID-19 and related research on PUBMED, automaticcomputer-assisted analysis is required. In this paper, we develop two methodologies to get insights onspecific sub-topics of interest and latest research sub-topics. They rely on natural language processingand graph-based visualizations. We run these methodologies on two cases: the virus origin and the usesof existing drugs.
KEYWORDS
Topic analysis; Automatic mining of scientific publication; COVID-19; automatic keyphrase extraction;1
INTRODUCTION
COVID-19 is one of the most important topic these days, specifically on search engines and news. It isa worldly shared topic of interest. While news are looped on TV, a very few specialists know deeply onCOVID-19. On the Internet, a lot of fake news started to circulate and spread as fast as the virus it-self.In such situation, citizens and decision makers need reliable sources of information. Scientific papers arecertainly such reliable sources that could be used for helping citizens knowing more about it and beinginformed on both a reliable and accurate way.Not only scientific sources can be use to try to answer some specific questions that arise but it is also away to know the main researchers, groups or institutes that work on a specific sub-topic, what the collab-orations are, ... Both in- deep analyses and overviews on a large quantity of research papers can help de-cision makers or even citizens to be better educated on the state of the art. For people, it can help moveaside fake news. It can also help new comers in the COVID-19 research field providing overviews of sub-topics first (main publication venues, main authors, ...Indeed the research in the domain is quite huge specifically if we consider other forms of COVIDs, SevereAcute Respiratory Syndrome, and Middle East Respiratory Syndrome. For example, the recently releasedcollection named COVID-19 Open Research Dataset consists of more than 40,000 articles. Some papers https://pages.semanticscholar.org/coronavirus-research a r X i v : . [ c s . D L ] A p r tart to provide reviews , , which are indeed very useful. This paper aims at providing a systematicmethodology to mine such a large publication set while giving some specific focuses on topics of interest.The rest of this paper is organized as follows: Section 2 presents some related works; Section 3 describesthe analysis framework including the data description and the methodology for analysis that was fol-lowed; Section 4 describes the methodology we developed to get insights oo a specific sub-topic or on lat-est research. Section 5 focuses on a deeper analysis on the origins of COVID-19. Section 6 focuses on theanalysis of the terminology related to the latest research. Finally Section 7 concludes this paper.2 RELATED WORK
Publication analysis on COVID-19.
Some related work reports either studies on various topics or focuses on one or two topics. For example,the report from Nature Medecine presents a focus on clinical trials and human studies, Preclinical stud-ies, and Epidemiology. Another distinction on related work is what level as automatic assistance theybenefit from. In most of the cases, it is difficult to know that level as it is not necessarily depicted. Forexample, seems to be a manual analysis of a few papers on the topic since the references are few. Hara-pan et al. presents a literature review on Coronavirus disease 2019 (COVID-19). This study cites 74references and presents a comprehensive state of the art on different sub-topics such as COVID-19 trans-mission, risk factors, diagnosis or treatments. Huang et al. reviewed 1,281 abstracts from which theyidentified 322 manuscripts relevant to 5 areas of interest for their study. The five topics they chose areas follows: antibody kinetics, correlates of protection, immunopathogenesis, antigenic diversity and cross-reactivity, and population seroprevalence. Dousset et al. presents a short analysis of a larger datasetwhich size is similar to the one that we are using in this study. However their study does not go in deepin the publication content but rather analyze the collaborations between researchers and countries. Theydo not analyze any specific topic.Different from the previous analysis, this paper combines (a) the analysis of a large data set of about25,000 references and (b) highly assisted analysis. While the methodology could be apply to various COVID-19 subtopics, we focus here on two main topics: the origin of the virus and the use of other disease treat-ment.2.2 Mining scientific publications
Most of the work in automatic publication analysis is related to scientometrics which as been definedas “quantitative study of science, communication in science, and science policy” . In this paper, publica-tion mining is not strictly a question of scientometrics, rather the objective is to build knowledge from alarge set of publications.Shibata et al. uses citation network analysis in order to detect emerging research. Small et al. also usedirect citation and co-citation analysis in order to identify emerging topics . Unfortunately, this infor-mation is not necessarily provided in digital database for large quantities of documents. Buscaldi et al.mine scholarly publications to build scientific knowledge graphs . As in our approach, the authors useexisting natural language processing and mining tools in their approach; however, they focus on the tex-ual parts of the publications. Ronzano and Saggion developped a platform to automatically extract andenrich structural and semantic aspects of scientific publications. Their approach also focuses on the tex-tual content and their applications are related to rhetorical sentence classification and extractive textsummarization . Mothe et al. present a platform to mine scientific publications. In their paper, theyfocus on detecting the main collaborations, focusing on the geographical structure of a domain .3 ANALYSIS FRAMEWORK
COVID-19 scientific publication set
Recently, publishers have released the COVID-19 Open Research Dataset. It is available at https://pages.semanticscholar.org/coronavirus-research . This data set consists of multiple files.Among them, the Metadata file (60Mb) is a CSV file corresponding to 44,270 research articles with linksto PubMed, Microsoft Academic and the WHO COVID-19 database of publications. The fields of thestructure of the records are as follows: title, doi, abstract, date of publication, authors, journal, as wellas internal document ids (PMC ID, PUBMED ID, Microsoft Academic Paper ID, WHO ID) and infor-mation whether the full text is available or not. While the meta file is a rich source of information, otherpieces of information that are missing in that data file can be very useful such as the affiliation of the au-thors for example. For this reason we also considered a more complete set regarding the attributes thatare provided.We chose to focus on the documents from PubMed only , which is known to be realable source. It doesnot contains all the 44k scientific papers from the COVID-19 Open Research Dataset but about 24,000papers. The query used to query the PubMed collection ( ) wasthe same as the one used in the submentioned data set: "COVID-19" OR "covid19" OR "Coronavirus" OR "Corona virus" OR "2019-nCoV" OR "SARS-CoV"OR "MERS-CoV" OR âĂIJSevere Acute Respiratory SyndromeâĂİ OR âĂIJMiddle East RespiratorySyndromeâĂİ A few elements of the document collection
A few statistics are presented in Table 1 where the top 7 authors (the ones that authored the larger num-ber of publications in the analyzed collection) are listed in Table 2. In average, there is . authors perpublications. Not all the publications come with an abstract (72% have an abstract for a total of 17,116).Table 3 lists the most frequent venues where the scientific papers have been published. Table 1.
A few statistics on the data collection.
Occur. at at leastLabel Number of least twice 10 timesPublication 23,784Abstract 17,116Author 75,147
54 48 able 2. Authors (full author names) with more than 100 publications in the collection.
Authors
YUEN, KWOK-YUNG (Univ. of Hong-Kong)PERLMAN, STANLEY (Univ. of Iowa)DROSTEN, CHRISTIAN (Berlin Institute of Virology, Germany)BARIC, RALPH S. (Univ. of North Carolina, USA)MEMISH, ZIAD A (Alfaisal University, Saudi Arabia)JIANG, SHIBO (New York Blood Center, USA)ENJUANES, LUIS (Campus Univ. AutÃşnoma de Madrid, Spain)
Table 3.
Most frequent publication venues in the collection.
Venues
JOURNAL OF VIROLOGYADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGYVIROLOGYEMERGING INFECTIOUS DISEASESBMJ (CLINICAL RESEARCH ED.)JOURNAL OF MEDICAL VIROLOGYTHE JOURNAL OF GENERAL VIROLOGY
Topics of interest . While researchers have their topic of interest driven by their funding, project and research, topic of in-terest also come from the civil society on COVID-19.It is worth to mention that NIST/TREC has formed a joint effort called TREC-COVID . Like the otherTREC tracks , TREC-COVID aims at gathering research teams in information retrieval to evaluate searchengines on specific tasks. Mid April 2020, TREC-COVID has release a set of 30 topics of interest. Atopic of interest is for example the “Coronavirus origin” or “early symptoms”.These topics also echo the most popular questions web searchers are interested on. Google trends men-tions "Where did coronavirius start?" and "How to know if you have coronarius" among the most askedquestions.Finally, these topics also echo the ones that have been considered in some related work papers , thatgenerally target the COVID-19 origins, its symptoms, spreading, risk factors and treatments.4 METHODOLOGY
Overview.
The information we use is the data as collected. While automatic analysis helps in handling large quan-tities of publications, the conclusions drawn have to be handle with caution because there is no manualanalysis of the content and no checking. Moreover, we did not solved content anomalies such as variants https://ir.nist.gov/covidSubmit/ TREC: Text Retrieval Conference is co-sponsored by the National Institute of Standards and Technology (NIST) andU.S. Department of Defense supports research for large-scale evaluation of text retrieval methodologies trec.nist.gov f entities spelling (e.g. author names). There are also missing values that we did not consider either andnot resolved (See Table 1).The methods we use are usual data mining tools like frequency, graph-based visualization, factorial anal-ysis. We consider crossing meta-data with content-based information from free texts as detailed later on.The analysis considers both meta-data, such as the source, publication date and author fields and freetext data such as the title and abstract fields. With regard to the titles and abstracts, we consider bothsingle words as well as phrases that we automatically extracted.4.2
Keyphrase extraction . Different methods have been developed to extract key-phrases from free texts , ; among the mostpopular are graph-based extraction , , , co-occurrence-based methods and more recently embed-dings .In our approach, we use a n-gram word extraction where we skip stop words. More precisely, we extractthe most frequent n-grams after stop word removal, but without stemming to keep more precise seman-tics. We also consider an initial lexicon from composed terms (as written by the authors e.g. “anti-malagia”or “animal-origin”) as initial phrases that are enriched by the n-gram extracted ones.4.3 Graph-based visualization
Graphs are among visualization tools the most used in the literature, as linking concepts or objects is themost common mining technique . Graph-based visualizations are widely used to visualize bibliometricnetworks , .In this paper, we mainly use bipartite graphs. A bipartite graph is a graph whose vertices (nodes) can bedivided into two disjoint and independent sets and where edges connect a node of each type. A bipartitegraph does not contain any odd-length cycle (Wikipedia). This type of representation is also widely usedfor document analysis and visualizations , . In this paper, bipartite graphs are used to visualize the re-sults of crossing meta-data and keyphrases extracted from publications.4.4 Process
In this paper, we developed two different processes: the first one can be applied to focus on any specificsub-topic of interest, this is the process we applied to the "Origin" of the virus sub-topic (See Section 5).The second process is related to the latest research and we apply it to detect some topic clusters (SeeSection 6).
Getting insights on a specific topic
This process consists into three steps: • Select the keyphrases related to the topic of interest. It can be computer-assisted by consideringtrings of characters (e.g. "ORIGIN" is a relevant character string to extract many relevant key-phrases such as "BAT-ORIGIN", "HUMAN-ORIGIN",...) ; stems of the topic word(s) is a goodstart. The automatically obtain list should then be manually checked in order to remove non-relevantterms (e.g. "ORIGINALITY") • Build a bipartite graph where vertices are key-phrases in the one hand and publication identifiers inthe other hand. This first representation provides a quick overview of the use of the terms: are theterms shared among publications or are they rather partitioned (each publication is more focusingon one of the aspects). It is also a mean to directly go to the associated publications; • For one specific term, build the bipartite graph where the other vertices are authors names. Thisgraph can be weighted by the number of publications of each author that mention the terms. Thisgraph shows the most "important" authors related to that term (who are likely to be specialists).Such a graph can also be built considering several related terms at the same time when terms arenon independents in publications (cf previous step). The later graph makes it possible to highlightthe authors that tackles several aspects of the topic.Other meta data could be also crossed to include additional steps in that process (e.g. considering theauthors’ affiliation or affiliation countries). This step was not included in this paper.
Getting the latest research topic of interest
This process also consists into three steps: • Extract the latest terminology: these are the keyphrases that occurs in the latest year(s) but notbefore. They are obtained by crossing the keyphrases and the year of the publications the keyphrasesoccur in. We consider also the occurrence frequency when selecting them; • Extract highly connected graphs (“communities” of keyphrases). This step results in sub-topics ofinterest consisting of terms of various nature but often used together in recent publications; • Build bipartite graphs where vertices of the first type are the terms from one community and ver-tices of the second type are publication identifiers to identify the relevant publications with regardto the group of terms.This process is used to get the latest research topics and associated publications.5
ORIGINS
The topic of interest we tackle in this section is related to the COVID-19 origins and uses the first pro-cess as described in Section 4.Figure 1(a) displays the terms related to the ORIGINs of the virus (extracted from the publications ti-tles and abstracts when available) as well as their frequency. For example, 33 publications study BAT-a) Terms related to the ORIGINs of the virus (b) Publication PMID that mention BAT-ORIGIN(c) Graph of the ORIGIN terms and associated PUB-ID
Figure 1.
Focus on ORIGIN
ORIGIN and 7 AVIAN-ORIGIN. We also used these terms to extract the associated publications. Theseterms are used in the graph from Figure 1(c) as blue nodes. Figure 1(b) displays the PMID of the publi-cations that mention the "BAT-ORIGIN".Figure 1(c) is a bipartite graph where blue nodes are the ORIGIN terms while the red nodes are the pub-lication identifiers. This graph is not fully linked since some publications mention one of these "ORIGIN"terms only. From this graph, we can also identify the publications that mentions several of the ORIGINerms as PMID- 26689940 , 31996437 and 23719724 which mention both "ANIMAL-ORIGIN" and"BAT-ORIGIN". Another example is PMID- 16012012 which mention both "HUMAN-ORIGIN" and"ANIMAL-ORIGIN" (See Figure 1(c)). Figure 2.
Authors associated to (a) BAT-ORIGIN term (b) ORIGIN terms from the one listed in Figure 1
With regard to the publications associated with BAT-ORIGIN terms in Figure 1(b), we can mentionRen et al. (PMID- 32004165) who report a novel bat-origin CoV causing severe and fatal pneumoniain humans. The identified virus is phylogenetically closest to a bat SARS-like CoV. PMID-31996413 employs a capture-based NGS approach for virus discovery. Since (SARS)-COV and (MERS)-CoV bothoriginated from bats, active surveillance is recommended in this paper. Yang et al. (PMID- 31554686)study potential cross-species transmissibility of SADS-CoV. PMID- 30533848 studies the full-lengthgenome sequence of a novel swine acute diarrhea syndrome coronavirus (SADS-CoV), CH/FJWT/2018a) Chloroquine-treatment term network (b) Obesity Risk Factor network Figure 3.
Examples of most recent terms and semantic network. which is closely related to CN/GDWT/2017. A link is also made with bat-origin SADS-related coron-aviruses. Wang et al. (PMID- 29680361 ) report cross-species transmission due to a large number ofmutations on the receptor-binding; here a novel bat-origin coronaviruses found in pigs is considered. Con-sidering the the publications that mention BAT-ORIGIN of COVID, the oldest paper was published in2006 , 11 were published in 2020, 2 in 2019 and 4 in 2018.We also had a look to the associated authors (See Figure 2(a)). In this figure, the only blue node is BAT-ORIGIN term, while the red nodes are authors of publications that mention this term. The value on thelink indicates the number of publications a researcher authored that mention BAT-ORIGIN. Figure 2(b)shows, for those authors that mention BAT-ORIGIN, the other ORIGIN terms also mentioned by them.The thickness of the link is an indication of the number of publications as well as the size of the nodes.6 LATEST RESEARCH
While in the previous section we did not consider the year of publication, in this section, we focus on thelatest research and the associated terminology. We consider the publications that are published in the30 last years only (1991 to 2020). We then keep the only phrases or automatically extracted keywordsfrom the titles and abstracts (see Section 3) that mainly occurs in 2020. More precisely, to be kept, akeyphrase has to occur at least 80% times in 2020. These keyphrases (there are about 1,500) are thusthe keyphrases of current interest. We then built a graph where nodes ( keyphrases) are linked when co-occuring in a publication. This is a weighted graph (the more the two keyphrases co-occur, the higherthe weight). We then extracted clusters of terms that are closely related as communities (nodes that arehighly inter-connected together and weakly connected with other nodes) considering different focuses.The example in Figure 3(a) focuses on the Chloroquine-treatment while Figure 3(b) is related to Obesityrisk factor. From these figures, we can also identify the related terms.COVID-19 is an infectious disease caused by SARS-CoV-2 and several papers investigate the use of ex-isting anti-viral treatments. For examples, some of the publications mention "anti-" (e.g. anti-flu, anti-malaria, ..). We thus had a closer look to the network related to treatment used for other diseases consid- igure 4.
Various "ANTI-" uses. ering various "anti-" terms as shown in Figure 4. As we did in Section 5 and illustrated in Figure 1(c),we look at the publications associated with these terms. We found 33 publications directly related tothese terms. The PMID of these publications as well as related terms are presented in Figure 5.
Figure 5.
PMID of the publications that mention an "ANTI" terms and the other related terms.
From this Figure 5, we can see that some of the papers are more connected than others; suggesting theyotentially mention more topics of interest. For example, PMID 32277367 , which is marked up with ablack square on the right-bottom part of Figure 5, was published on the 10th of April, in J Neurooncol.and results from a collaborative work of India, Russia, and UK and concludes that “it may be cautiouslyrecommended to continue glucocorticoids and other disease-modifying antirheumatic drugs (DMARDs)in patients receiving these therapies, with discontinuation of DMARDs during infections as per standardpractice”. This paper also mentions various evidence that suggest potential benefits of various drugs. Re-versely, we can see in the same figure that some keywords are also more shared than others. As an exam-ple, anti-rheumatic is mentioned in 9 of the displayed publications.Among these publications, we can quote Perricone (PMID- 32317220) discusses the anti-viral aspect ofimmunosuppressants for searching for a potential treatment for SARS-CoV-2 infection. Lehrer et al. (PMID- 32313883) discusses the effects of biguanides on influenza and coronavirus. Kumar et al. (PMID-32313660) study the antibody therapy as an immediate strategy for emergency prophylaxis and SARS-CoV-2 therapy . Song et al. (PMID- 32314010) reports a case of COVID-19 pneumonia on a 61-year-old female rheumatoid arthritis; she was treated with antiviral agents (lopinavir/ritonavir), and treat-ment with cDMARDs was discontinued except hydroxychloroquine. Her symptoms gradually improvedand three weeks later, real-time PCR for COVID-19 showed negative conversion .7 CONCLUSION
This paper presents two analytical processes in order to mine scientific papers that are illustrated onCOVID-19 scientific publications. The results are knowledge graphs of various natures that helps get-ting insights on specific subtopics or recent research topics. While scientific papers are reliable sources ofknowledge, on COVID-19, other sources of information such as the World Health Organization reportswould also worth being analysed using the same type of methodology.
REFERENCES1.
Alsallakh, B., Micallef, L., Aigner, W., Hauser, H., Miksch, S., Rodgers, P., 2014. Visualizing setsand set-typed data: State-of-the-art and future challenges, in: Eurographics Conference on Visualiza-tion (EuroVis), pp. 1–22. Beliga, S., 2014. Keyword extraction: a review of methods and approaches. University of Rijeka,Department of Informatics, Rijeka , 1–9. Boudin, F., 2013. A comparison of centrality measures for graph-based keyphrase extraction, in:Proceedings of the sixth international joint conference on natural language processing, pp. 834–838. Boudin, F., 2018. Unsupervised keyphrase extraction with multipartite graphs. arXiv preprintarXiv:1803.08721 . Buscaldi, D., Dessì, D., Motta, E., Osborne, F., Recupero, D.R., 2019. Mining scholarly publicationsfor scientific knowledge graph construction, in: European Semantic Web Conference, Springer. pp.8–12. Carvalho, T., 2020. Covid-19 research in brief: 18 april to 24 april, 2020. NatureMedecine, News, 24April . Crossno, P.J., Wilson, A.T., Shead, T.M., Dunlavy, D.M., 2011. Topicview: Visually comparing topicodels of text collections, in: 2011 ieee 23rd international conference on tools with artificial intelli-gence, IEEE. pp. 936–943. Dousset, B., Mothe, J., 2020. A short study on covid-19 open research scientific papers. InternalReport, March URL: . van Eck, N.J., Waltman, L., 2014. Visualizing Bibliometric Networks. Springer International Pub-lishing, Cham. chapter 13. pp. 285–320. URL: https://doi.org/10.1007/978-3-319-10377-8_13 ,doi: . Feng, Q., Liu, Y., Qu, X., Deng, H., Ding, M., Lau, T.L., Yu, A.C.H., Chen, J., 2006. Baculovirussurface display of sars coronavirus (sars-cov) spike protein and immunogenicity of the displayed pro-tein in mice models. DNA and cell biology 25, 668–673.
Harapan, H., Itoh, N., Yufika, A., Winardi, W., Keam, S., Te, H., Megawati, D., Hayati, Z., Wag-ner, A.L., Mudatsir, M., 2020. Coronavirus disease 2019 (covid-19): A literature review. Journal ofInfection and Public Health .
Hasan, K.S., Ng, V., 2014. Automatic keyphrase extraction: A survey of the state of the art, in:Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume1: Long Papers), pp. 1262–1273.
Hess, D.J., 1997. Science studies: An advanced introduction. NYU press.
Hu, B., Ge, X., Wang, L.F., Shi, Z., 2015. Bat origin of human coronaviruses. Virology journal 12,221.
Huang, A.T., Garcia-Carreras, B., Hitchings, M.D., Yang, B., Katzelnick, L., Rattigan, S.M., Borg-ert, B., Moreno, C., Solomon, B.D., Rodriguez-Barraquer, I., Lessler, J., Salje, H., Burke, D.S.,Wesolowski, A., Cummings, D.A., 2020. A systematic review of antibody mediated immunity tocoronaviruses: antibody kinetics, correlates of protection, and association of antibody responses withseverity of disease. medRxiv doi: . Kaur, J., Gupta, V., 2010. Effective approaches for extraction of keywords. International Journal ofComputer Science Issues (IJCSI) 7, 144.
Kumar, G., Jeyanthi, V., Ramakrishnan, S., 2020. A short review on antibody therapy for covid-19.New Microbes New Infect doi: . Lai, T.S.T., Keung Ng, T., Seto, W.H., Yam, L., Law, K.I., Chan, J., 2005. Low prevalence of sub-clinical severe acute respiratory syndrome-associated coronavirus infection among hospital healthcareworkers in hong kong. Scandinavian journal of infectious diseases 37, 500–503.
Lehrer, S., 2020. Inhaled biguanides and mtor inhibition for influenza and coronavirus. WorldAcademy of Sciences Journal .
Li, B., Si, H.R., Zhu, Y., Yang, X.L., Anderson, D.E., Shi, Z.L., Wang, L.F., Zhou, P., 2020. Discov-ery of bat coronaviruses through surveillance and probe capture-based next-generation sequencing.Msphere 5.
Li, K., Li, H., Bi, Z., Gu, J., Gong, W., Luo, S., Zhang, F., Song, D., Ye, Y., Tang, Y., 2018. Com-plete genome sequence of a novel swine acute diarrhea syndrome coronavirus, ch/fjwt/2018, isolatedin fujian, china, in 2018. Microbiol Resour Announc 7, e01259–18.
Lorusso, E., Mari, V., Losurdo, M., Lanave, G., Trotta, A., Dowgier, G., Colaianni, M.L., Zatelli, A.,Elia, G., Buonavoglia, D., et al., 2019. Discrepancies between feline coronavirus antibody and nucleiccid detection in effusions of cats with suspected feline infectious peritonitis. Research in veterinaryscience 125, 421–424.
Misra, D.P., Agarwal, V., Gasparyan, A.Y., Zimba, O., 2020. RheumatologistsâĂŹ perspective oncoronavirus disease 19 (covid-19) and potential therapeutic targets. Clinical Rheumatology , 1–8.
Mothe, J., Chrisment, C., Dkaki, T., Dousset, B., Karouach, S., 2006. Combining mining and visu-alization tools to discover the geographic structure of a domain. Computers, environment and urbansystems 30, 460–484.
Mothe, J., Ramiandrisoa, F., Rasolomanana, M., 2018. Automatic keyphrase extraction using graph-based methods, in: Proceedings of the 33rd Annual ACM Symposium on Applied Computing, pp.728–730.
Osabe, Y., Jibu, M., 2018. Introductory chapter: Scientometrics. Scientometrics , 1.
Perricone, C., Triggianese, P., Bartoloni, E., Cafaro, G., Bonifacio, A.F., Bursi, R., Perricone, R.,Gerli, R., 2020. The anti-viral facet of anti-rheumatic drugs: Lessons from covid-19. Journal of Au-toimmunity , 102468.
Ren, L.L., Wang, Y.M., Wu, Z.Q., Xiang, Z.C., Guo, L., Xu, T., Jiang, Y.Z., Xiong, Y., Li, Y.J.,Li, X.W., et al., 2020. Identification of a novel coronavirus causing severe pneumonia in human: adescriptive study. Chinese medical journal .
Ronzano, F., Saggion, H., 2016. Knowledge extraction and modeling from scientific publications, in:International workshop on semantic, analytics, visualization, Springer. pp. 11–25.
Sahrawat, D., Mahata, D., Zhang, H., Kulkarni, M., Sharma, A., Gosangi, R., Stent, A., Kumar, Y.,Shah, R.R., Zimmermann, R., 2020. Keyphrase extraction as sequence labeling using contextualizedembeddings, in: Jose, J.M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins,F. (Eds.), Advances in Information Retrieval, Springer International Publishing, Cham. pp. 328–335.
Shibata, N., Kajikawa, Y., Takeda, Y., Sakata, I., Matsushima, K., 2009. Detecting emerging re-search fronts in regenerative medicine by citation network analysis of scientific publications, in:PICMET’09-2009 Portland International Conference on Management of Engineering & Technology,IEEE. pp. 2964–2976.
Small, H., Boyack, K.W., Klavans, R., 2014. Identifying emerging topics in science and technology.Research policy 43, 1450–1467.
Song, J., Kang, S., Choi, S.W., Seo, K.W., Lee, S., So, M.W., Lim, D.H., 2020. Coronavirus disease19 (covid-19) complicated with pneumonia in a patient with rheumatoid arthritis receiving conven-tional disease-modifying antirheumatic drugs. Rheumatology International , 1.
Van Eck, N.J., Waltman, L., 2017. Citation-based clustering of publications using citnetexplorer andvosviewer. Scientometrics 111, 1053–1070.
Wan, Y., Shang, J., Graham, R., Baric, R.S., Li, F., 2020. Receptor recognition by the novel coron-avirus from wuhan: an analysis based on decade-long structural studies of sars coronavirus. Journalof virology 94.
Wang, L., Su, S., Bi, Y., Wong, G., Gao, G.F., 2018. Bat-origin coronaviruses expand their hostrange to pigs. Trends in microbiology 26, 466–470.37.