[PDF] Computational modeling of Human-nCoV protein-protein interaction network

Abstract

COVID-19 has created a global pandemic with high morbidity and mortality in 2020. Novel coronavirus (nCoV), also known as Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV2), is responsible for this deadly disease. International Committee on Taxonomy of Viruses (ICTV) has declared that nCoV is highly genetically similar to SARS-CoV epidemic in 2003 (89% similarity). Limited number of clinically validated Human-nCoV protein interaction data is available in the literature. With this hypothesis, the present work focuses on developing a computational model for nCoV-Human protein interaction network, using the experimentally validated SARS-CoV-Human protein interactions. Initially, level-1 and level-2 human spreader proteins are identified in SARS-CoV-Human interaction network, using Susceptible-Infected-Susceptible (SIS) model. These proteins are considered as potential human targets for nCoV bait proteins. A gene-ontology based fuzzy affinity function has been used to construct the nCoV-Human protein interaction network at 99.98% specificity threshold. This also identifies the level-1 human spreaders for COVID-19 in human protein-interaction network. Level-2 human spreaders are subsequently identified using the SIS model. The derived host-pathogen interaction network is finally validated using 7 potential FDA listed drugs for COVID-19 with significant overlap between the known drug target proteins and the identified spreader proteins.

Full PDF

CComputational modeling of Human-nCoV protein-protein interaction network

Sovan Saha , Anup Kumar Halder , Soumyendu Sekhar Bandyopadhyay , Piyali Chatterjee , Mita Nasipuri , and Subhadip Basu Dr. Sudhir Chandra Sur Degree Engineering College, Computer Science and Engineering, Kolkata, 700074, India Jadavpur University, Computer Science and Engineering, Kolkata, 700032, India Netaji Subhash Engineering College, Computer Science and Engineering, Kolkata, 700152, India * [email protected] + Equal Contribution, both shared 1 st authorship ABSTRACT

89% similarity). Limited number of clinically validated Human-nCoV protein interaction data is available in the literature. With this hypothesis, the present work focuses on developing a computational model for nCoV-Human protein interaction network, using the experimentally validated SARS-CoV-Human protein interactions. Initially, level-1 and level-2 human spreader proteins are identified in SARS-CoV-Human interaction network, using Susceptible-Infected-Susceptible (SIS) model. These proteins are considered as potential human targets for nCoV bait proteins. A gene-ontology based fuzzy affinity function has been used to construct the nCoV-Human protein interaction network at ∼ Introduction

COVID-19 evolved in the Chinese city of Wuhan (Hubei province) . The first case of human species getting affected by nCoV was observed on 31 December 2019 . Soon it expands its adverse effect on almost all nations within a very short span of time . World Health Organization (WHO) observes that the massive disastrous outbreak of nCoV is mainly due to mass community spreading and declares a global health emergency on 30 January 2020 . After proper assessment, WHO presumes its fatality rate to be 4% which urges the global researchers to work together to discover a proper treatment for this pandemic

6, 7 . Coronaviridae is the family to which a corona virus belongs. It also infects birds and mammals besides affecting human beings. Though the common symptoms of corona virus are common cold, cough etc., but it is accompanied by severe acute chronic respiratory disease along with multiple organ failure leading to human death. Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS) are the two major outbreaks in 2003 and 2012 respectively before SARS-CoV2. The source of origin of SARS was located in Southern China. Its fatality rate was within 14%-15% due to which 774 people lost their lives among 8804 affected cases. Saudi Arabia was marked as the base for the commencement of MERS. 858 persons among 2494 infected cases were defeated in their battle against MERS virus. Thus it generated a much higher fatality rate of 34.4% when compared to that of SARS. All of the three epidemic creators SARS, MERS and SARS-CoV2 are biologically included in the genus Betacoronavirus which is under the family Coronaviridae. Both structural and non-structural proteins are involved in the formation of SARS-CoV2. Out of the two, structural proteins like the envelope (E) protein, membrane (M) protein, nucleocapsid (N) protein and the spike (S) protein play a major role in transmitting the disease by binding with the receptors after entering into human body . No clinically proven and approved vaccine for nCoV is available till date though researchers have been trying hard to develop it. So there is an urgent need to understand and analyse the mechanism of disease transmission of this new virus. ost-pathogen protein-protein interaction networks (PPIN) are very significant for understanding the mechanism of transmission of infection, which is essential for the development of new and more effective therapeutics, including rational drug design. Progression of Infection and disease results in due to the interaction of proteins in between pathogen and host. Pathogen plays an active role in spreading infection. It has been proved to be an acute threat to human lives as it has the mutative capability to adapt itself toward drugs. Pathogen and host PPIN permits pathogenic microorganisms to utilize host capabilities by manipulating the host mechanisms in order to abscond from the immune responses of the host . Detection of target proteins through the analysis of pathogen and host PPIN is the central point of research study

14, 15 . Topologically significant proteins having higher degree of interactions are generally found to be essential drug targets. However, proteins having less number of interactions or topologically not significant may be involved in the mechanism of infection because of some biological pathway relevance. However, clinically validated Human-nCoV protein interaction data is limited in the current literature. This has motivated us to develop a new computational model for nCoV-Human PPI network. We have subsequently validated the proteins involved in the host-pathogen interactions with respect to potential Food and Drug Administration (FDA) drugs for COVID-19 treatment. Key aspects of this research work are highlighted below: • It has been reported that SARS-CoV has ∼

16, 17 genetic similarities with nCoV. • SARS-CoV-Human protein-protein interaction network has also been studied widely and available in literature . • Recently we have developed a computational model to identify potential spreader proteins in a Human-SARS CoV interaction network using SIS model . • Sequence information of some of the nCoV proteins have been released . • Gene ontological (GO) information (Biological Process (BP), Molecular Function (MF), Cellular Component (CC)) of some of the nCoV proteins are also available

22, 23 . • Recently we have also developed a method to predict interaction affinity between proteins from the available GO graph. . • Assessment of interaction affinity between nCoV proteins with potential Human target/bait proteins, which are susceptible to SARS-CoV infection, has been done. • Fuzzy affinity thresholding is done to detect High Quality nCoV-Human PPIN. The selected human proteins are considered as level-1 human spreader nodes of nCoV. • Level 2 spreader node in nCoV-Human PPIN are detected using spreadability index and validated by SIS

21, 25 model. • Validation of our developed model is done with respect to the target proteins of the potential FDA drugs for COVID-19 treatment. Results

Our developed computational model of nCoV-Human PPIN contains high quality interactions (HQI) and proteins identified by Fuzzy affinity thresholding and spreadability index validated by SIS model respectively. Sources of input and the generated results always play a crucial role in any computational model which is also true for our proposed model.

Overview of the data sets

SARS-CoV-Human PPIN serves as a baseline for our model. The potential level-1 and level-2 human spreaders of SARS-CoV becomes the possible candidate set for selecting level-1 human spreaders of SARS-CoV2. Various datasets have been curated for this purpose which has been outlined below:

Human PPIN

The dataset

27, 28 consists of all possible interactions between human proteins that are experimentally documented in humans. Human proteins are represented as nodes while the physical interactions between proteins are represented by edges. It is a collection of 21557 nodes and includes 342353 edges/interactions.

ARS-CoV PPIN

The dataset consists of interactions between SARS-CoV proteins. It contains 7 unique proteins along with the involvement of 17 interacting edges out of which only the densely connected proteins are considered rather than the isolated ones since the former play a more active role in transmission of infection than the later. SARS-CoV-Human PPIN

The dataset comprises of 118 interactions between SARS-CoV and Human. It is used to fetch the level-1 human interactions of SARS-CoV. Figure 1.

Computational model for the selection of spreader nodes in Human-SARS CoV PPIN by spreadability index. Red colored nodes represent SARS-CoV proteins while blue colored nodes are the selected spreader nodes in it. Deep green colored nodes represent level-1 human connected proteins with SARS-CoV proteins while yellow colored nodes represent the selected human spreaders in it. Light green colored nodes represent level-2 human spreaders of SARS-CoV.

SARS-CoV2 Proteins

This data is collected from pre-released dataset of available SARS-CoV2 protein from UniProtKB (https://covid-19.uniprot.org/) which include 14 reviewed SARS-CoV2 proteins. GO Graph and Protein-GO annotations

Three types (CC, MF and BP) of GO graph are collected from GO Consortium

23, 29 (http://geneontology.org/). Protein to GO-annotation map is retrieved from UniProtKB database.

Potential COVID-19 FDA drugs

Seven potential FDA drugs : Lopinavir , Ritonavir , Hydroxychloroquine

32, 33 , Azithromycin , Remdesivir , Favipi- ravir

37, 38 and Darunavir have been identified from the DrugBank published white paper which have been used for validation in our proposed model. Selection of spreader nodes in Human-SARS CoV interaction network using spreadability index validated by SIS model:

SARS-CoV-Human PPIN (up to level-2) is formed by the combination of SARS-CoV-Human and Human-Human PPIN datasets. SARS-CoV-Human dataset generates the direct level-1 human interactions of SARS-CoV hile human-Human PPIN dataset is used to fetch the corresponding level-2 human interactions. Potential spreader nodes are identified using spreadability index which has been validated by SIS model (see Figure 1) . The selected spreader nodes in SARS-COV-Human PPIN are highlighted in Table S1, Table S2 and Table S3 in the supplementary document. The network view of SARS-CoV-Human PPIN at each level and various selected thresholds are also available online (SARS-CoV Level-1 human spreaders, Level-1 & Level-2:high human spreaders at high threshold of spreadability index, and Level-1 & Level-2:low human spreaders at low threshold of spreadability index). Figure 2.

Schematic diagram of Fuzzy PPI model. A) The fuzzy PPI model finds the interaction affinity between the SARS-CoV2 and Human proteins (L1 and L2 spreader of SARS-CoV) using gene ontological information. B) All GO pair-wise interaction affinities are assessed from three independent GO-relationship graphs CC, MF and BP and fuzzy binding affinity of a protein pair is computed from all three pairwise scores of all GO-pair affinities. C) Heatmap representation of Fuzzy PPI score. D) Network representation of Human and SARS-CoV2 proteins with 0.2 onward threshold of Fuzzy PPI score at high specificity. Finally, high quality interactions are extracted to retrieve the potential human prey for SARS-CoV2 at 1.4 threshold.

Figure 3.

Network representation of HQIs (score≥ 0.4) between SARS-CoV2 and human proteins. Blue and yellow spherical nodes represent the SARS-CoV2 and human proteins respectively. The edge width reflects the fuzzy score of binding affinity. dentification of the nCoV-Human proteins interactions using Fuzzy PPI model:

The GO information can be useful to infer the binding affinity of any pair of interacting proteins using three different types of GO hierarchical relationship graph (CC, MF and BP) . The fuzzy PPI model has been proposed to find the interaction affinity between the SARS-CoV2 and Human proteins using GO based information (please see Figure 2 and Methods for details). To identify the interactors of SARS-CoV2 on human using the Fuzzy PPI model, a set of candidate proteins are selected as the L1 and L2 spreader nodes of SARS-CoV using SIS model (as depicted in Figure 1). Fuzzy PPI model is constructed from the ontological relationship graphs by evaluating the affinity between all possible GO pairs annotated from any target protein pair and finally the fuzzy score of interaction affinity of protein pair is computed from these GO pair-wise interaction affinity in to a range of [0,1] (details are discussed in the Methods). The heatmap representation of fuzzy interaction affinities (with score ≥ 0.2 for very high specificity ~ 99%) is shown in supplementary Figure S1 and Table S4. The high quality interaction (HQI) is retrieved at threshold 0.4, (almost ~ 99.98% Specificity) which results total 78 interactions between SARS-CoV2 and Human (37 proteins). The interaction networks predicted from Fuzzy-PPI model are shown in Figure 3. Identification of Human spreader proteins for nCoV

Human proteins present in the high quality interactions of nCoV-Human PPIN fetched by applying fuzzy affinity threshold are considered as level-1 spreaders. From these level-1 spreaders, corresponding level-2 human interactions are obtained using human-Human PPIN dataset. Spreadability index is thus computed for these level-2 human proteins for the identification of level-2 human spreader nodes. The selection is also verified by SIS model. The selected spreader nodes in SARSCOV-Human PPIN are highlighted in Table S4, Table S5 and Table S6 in the supplementary document. A sample computational model

Figure 4.

Derived nCoV-Human PPIN with human spreader proteins from proposed computational model. Blue, yellow and green colored nodes denote nCoV spreaders, its human level-1 and level-2 spreaders. level-1 human spreaders are detected by applying fuzzy affinity thresholding while level-2 human spreaders are identified by spreadability index validated by SIS model.

Figure 5.

Validation of our developed computational model with respect to the target proteins of the FDA accepted drugs for COVID-19 treatment. Yellow and green colored nodes denote level-1 and level-2 human spreader of nCoV which acts as the drug protein targets. of nCoV-Human PPIN under high threshold has been highlighted in Figure 4. The network view of SARS-CoV2-Human PPIN at each level and various selected thresholds are also available online (SARS-CoV2 Level-1 human spreaders, Level-1 & Level-2:high spreaders at high threshold of spreadability index and Level-1 & Level-2:low human spreaders at low threshold of spreadability index).

Validation using potential FDA drugs for COVID-19:

After proper assessment of all potential drugs as mentioned in the DrugBank white paper , seven drugs Lopinavir , Ritonavir , Hydroxychloroquine

32, 33 , Azithromycin , Remdesivir , Favipiravir

37, 38 and Darunavir are identified which are showing expected results to some extent in the clinical trials done for SARS-CoV2 vaccine till date. All approved human protein targets for each of the five approved drugs are fetched from the advanced search section of drug bank

40, 42 . These targets when searched in our proposed model of nCoV-Human PPIN are found to play an active role of spreader nodes. This reveals the fact that the selected spreader nodes are of biological importance in transmitting infection in a network which actually make them the protein drug targets of the potential FDA drugs for COVID-19. The target protein hits in our nCoV-Human PPIN for each of the 7 potential FDA drugs are highlighted in Figure 5. It can be observed that 4 target protein hits are obtained for Hydroxychloroquine, 3 target proteins for Ritonavir, 2 target protein hits for each of Lopinavir, Darunavir and Azithromycin and 1 target protein hit for Remdesivir and Favipiravir. Out of these protein targets, ACE2 is the most important one since it is considered to be the one of the crucial receptors of human for nCoV to transmit infection deep inside the human cell . Discussion

In any host-pathogen interaction network, identification of spreader nodes is crucial for disease prognosis. Not every protein in an interaction network has intense disease spreading capability. In this work, we have used the SARS-CoV-Human PPIN network and the spreader nodes at both level-1 and level-2 using the SIS model. These spreader nodes are considered for computing the protein interaction affinity score to unmask the level-1 human spreaders of nCoV. GO annotations have been also taken into consideration along with PPIN properties to make this model more effective and significant. With the gradual progress of the work, it has been observed that the selected human spreader nodes, identified by our proposed model, emerge as the potential protein targets of the FDA approved drugs for COVID-19. The basic hypotheses of the work may be listed as follows:

Figure 6.

Specificity at different threshold (x-axis) of binding affinity obtained from Fuzzy PPI model for complete human proteome interaction network. At 0.2 onward threshold, it produces high specificity with respect to benchmark positive and negative interaction data. High quality interactions are extracted at 0.4 threshold with ~ 99.9% specificity. • There is a genetic overlap of ~ 89% (as suggested by ICTV) between SARS-CoV and SARS-CoV2, which also leads to a significant overlap in spreader proteins between human-SARS-COV and human-SARS-COV2 protein-interaction network. • Fuzzy PPI approach can assess protein interaction affinities at very high specificity with respect to benchmark datasets as shown in Figure 6. High specificity signifies very low false positive rate at a given threshold. Thus, at 0.4 threshold (~ 99.9% specificity), the proposed model evaluates high quality positive interactions in Human-nCoV PPIN. Finally we propose, that the developed computational model effectively identifies Human-nCoV PPIs with high specificity. nCoV-Human interactions are inferred from another pandemic initiator SARS-CoV which is believed to be highly genetically similar to nCoV. We also identify the human spreader proteins (up to level-2) using spreadability index, validated through SIS model. Due to high network density in human interaction network, number of proteins increase with the transition from one level to another. So, our proposed model is also capable of identifying human spreader proteins in level-2 by using spreadability index which is validated by SIS model. Target proteins of the potential FDA drugs for COVID-19 are found to overlap with the spreader nodes of the proposed computational nCoV-Human protein interaction model. Target proteins of seven potential FDA drugs: Lopinavir , Ritonavir , Hydroxychloroquine

32, 33 , Azithromycin , Remdesivir , Favipiravir

37, 38 and Darunavir for COVID-19 as mentioned in the DrugBank white paper overlap with the spreader nodes of the proposed in silico nCoV-Human protein interaction model (see Figure 5). Though clinical trials for COVID-19 vaccine in 2020 are on their way till date but three out of the seven i.e. Remdesivir Hydroxychloroquine and Favipiravir are found to be the most promising as well as effective ones and their protein targets R1AB_SARS2, TLR9, ACE2, CYP3A4 and ABCB1 are also successfully identified as spreader nodes by our proposed model. This assessment reveals the fact that these spreader nodes are indeed have biological relevance relative to disease propagation. Methods

Our developed computational model for nCoV-Human PPIN consists of two important methodologies 1) identification of spreader nodes by spreadability index along with the validation of SIS model and 2) Fuzzy PPI model.

Identification of spreader nodes by spreadability index along with the validation of SIS model:

In nCoV-Human PPIN, the former acts as a pathogen while the host acts as bait. The transmission of infection tarts when a pathogen enters a host body and starts infecting its protein which in turn affects its directly or indirectly connected neighborhood proteins. Considering this method of transmission, PPIN of human and SARS-CoV are considered to detect spreader nodes. Spreader nodes are those nodes proteins which actually transmits the disease fast among its neighbors. But not all the nodes in a PPIN are spreaders. So, proper detection of spreader nodes is crucial. Thus, spreader nodes are identified by spreadability index which actually measure the transmission capability of a node protein.

Figure 7.

Synthetic protein-protein interaction network. It is comprised of 10 nodes and 25 edges.

Table 1.

Computation of spreadability index of Figure 7 along with validation of selected top 5 spreader nodes by SIS model.

Rank Proteins

Edge Ratio Neighborhood Density Node Weight Spreadability Index Sum of SIS infection rate of top 5 nodes

1 Node 3 6 3 1.75 6.94 2.83 14.99

2 Node 9 5 4 1.20 7.07 3.00 11.48 3 Node 6 5 2 2.00 3.93 2.60 10.46 4 Node 8 6 2 2.33 2.27 3.25 8.55 5 Node 1 5 4 1.20 4.21 3.40 8.45 Compactness of PPIN and its transferal capability is evaluated using centrality analysis. Nodes having high centrality value are usually considered as spreader nodes or the most critical node in a network. Spreadability index is one of the centrality based measure which is a combination of three major topological neighborhood based features of a network. They are: 1) Node weight

2) Edge ratio and 3) Neighborhood density . Nodes having high spreadability index are considered as spreader nodes. The spreader nodes thus identified are also validated by SIS model . SIS model is implemented with a motive of designing the SARS-CoV and SARS-CoV2 outbreak into a disease model consisting of proteins based on their present infection status. A protein can be in either of one of the three states: 1) S: Susceptible, which means that every protein is initially susceptible though it is not yet infected but is at risk of getting infected by disease. 2) I: Infected, which means that the protein is already infected by the disease and 3) S: Susceptible, which means proteins again become susceptible after getting recovered from infected state. This model is implemented in such a way that it generates the overall infection capability of a node after a certain range of iterations. Thus the sum of the infection capability of the top selected spreader nodes are computed by this model which is compared against the sum obtained for the selected top critical nodes by other existing centrality measures like Betweenness Centrality (BC) , Closeness centrality (CC) , Degree centrality (DC) and Local average centrality (LAC) . Our proposed method of selecting spreader nodes has performed better in comparison to the other existing tate - of - the - art . A synthetic PPIN is considered in Figure 7 to demonstrate the entire methodology of spreadability index. Computational analysis of spreadability index of our proposed model with one of the other methodology BC has been highlighted from Table 1 to supplementary Table S7. is the total number of edges which are outgoing from the ego network whereas is denoted as the total number of interconnections in the neighborhood of node . of node 3 is 6 while of node 3 is 3 which highlights the fact that node 3 has the highest transmission ability from its ego network to outside when compared to other nodes. Node 3 has also the highest spreadability index. But BC failed to rank node 3 in first position. Same scenario can be observed for some other nodes in the synthetic network too. Besides SIS validation result shows that the selected top ranked spreader nodes in this proposed model have the highest infection capability in comparison to the other ranked nodes. Fuzzy PPI Model for potential SARS-CoV2-Human interaction identification:

The binding affinity between any two interacting proteins can be estimated by combining the semantic similarity scores of the GO terms associated with the proteins

15, 24, 54–56 . Greater number of semantically similar GO annotations between any protein pair indicates higher interaction affinity. Fuzzy PPI model is a hybrid approach that utilizes both the topological features of the GO graph and information contents

56, 58, 59 of the GO terms. GO is organized in three independent directed acyclic graphs (DAGs): molecular function (MF), biological process(BP), and cellular component(CC) . The nodes in each GO graph represent GO terms and the edges represent different hierarchical relationships. In this work, two most important relations ‘is a’ and ‘part of’ has been used for GO relations . The semantic similarity between any two proteins is estimated by considering the similarities between their all pairs of annotating gene ontology (GO) terms belonging to a particular ontological graph. The similarity of a GO term pair is determined by considering certain topological properties (shortest path length) of the GO graph and the average information content (IC) of the disjunctive common ancestors (DCAs)

54, 55 of the GO terms as proposed in . Fuzzy PPI first rely on a fuzzy clustering of the GO graph where the selection of GO terms as cluster center is based on the level of association of that GO term in the GO graph. The cluster centers are selected based on the proportion measure of GO terms. The proportion measure for any GO term t is computed as ( ) (| ( )| | ( )|) | ( )|⁄ where ( ) , ( ) represents the ascendant and descendant of term and ( ) is the total number of GO terms in ontology O . Higher value of proportion measure signifies higher coverage of ascendants and descendants associated with the specific node. The GO terms for which this proportion measure is above a predefined threshold are selected as cluster centers. In this work, the cluster centers are selected based on the threshold values as suggested in

15, 24 . After selecting the cluster centers, the degree of membership of a GO term to each of the selected cluster centers is calculated using its respective shortest path lengths to the corresponding cluster centers. The membership of the GO term to a cluster decreases with increase in its shortest path length to the cluster center. The membership function is defined as ( ) ( ) ⁄ where is center and is the width of membership function and is the shortest path length from to . The difference in membership values between the GO pair and with respect to each cluster center, are computed to find the weight parameter. The weight parameter is defined as ( ) ( ) . This weight value determines how dissimilar two GO terms can be with respect to the cluster centers. Next, the shared information content (SIC) is computed using average IC , of the DCAs of GO term pair ( ) is determined from three GO graphs. The SIC is defined as ( ) ∑ ( ) ( ) | ( )|⁄ , where ( ) represents the disjunctive common ancestors of GO-term and . The semantic similarity of protein pair ( ) for each GO-type (CC, MF and BP), is estimated by utilizing the maximum similarity of all possible GO pairs from the annotations of proteins and for each type of GO. The binding affinity of protein pair ( ) is defined as the average of CC, MF and BP based semantic similarity. The fuzzy score of binding affinity is computed by normalizing the binding affinity using max-min normalization. In this work, the fuzzy binding affinity score is computed between the protein pairs of SARS-CoV2 and human proteins using the available ontological information. Finally, with high specificity threshold (please see Figure 6), high quality interactions are extracted for human-SARS-CoV2. eferences Wang, C., Horby, P. W., Hayden, F. G. & Gao, G. F. A novel coronavirus outbreak of global health concern.

The Lancet World-Health-Organization Coronavirus disease (COVID-19) outbreak. World Map | CDC. Statement on the second meeting of the International Health Regulations (2005) Emergency Committee regarding the outbreak of novel coronavirus (2019-nCoV). Statement on the meeting of the International Health Regulations (2005) Emergency Committee regarding the outbreak of novel coronavirus 2019 (n-CoV) on 23 January 2020. Huang, C. et al.

Clinical features of patients infected with 2019 novel coronavirus in wuhan, china.

The Lancet Heymann, D. L. Data sharing and outbreaks: best practice exemplified.

The Lancet Organization, W. H. et al.

Update 49-sars case fatality ratio, incubation period. World Heal. Organ. Geneva (2003). WHO | Middle East respiratory syndrome coronavirus (MERS-CoV).

Chen, Y., Liu, Q. & Guo, D. Emerging coronaviruses: genome structure, replication, and pathogenesis.

J. medical virology

92, 418–423 (2020).

Dyer, M. D., Murali, T. & Sobral, B. W. Computational prediction of host-pathogen protein–protein interactions.

Bioinformatics

23, i159–i166 (2007).

Dutta, P., Halder, A. K., Basu, S. & Kundu, M. A survey on ebola genome and current trends in computational research on the ebola virus.

Briefings functional genomics

17, 374–380 (2018).

Dyer, M. D., Murali, T. & Sobral, B. W. Supervised learning and prediction of physical interactions between human and hiv proteins.

Infect. Genet. Evol.

11, 917–923 (2011).

Saha, S., Sengupta, K., Chatterjee, P., Basu, S. & Nasipuri, M. Analysis of protein targets in pathogen–host interaction in infectious diseases: a case study on plasmodium falciparum and homo sapiens interaction network.

Briefings functional genomics

17, 441–450 (2018).

Halder, A. K., Dutta, P., Kundu, M., Basu, S. & Nasipuri, M. Review of computational methods for virus–host protein interaction prediction: a case study on novel ebola–human interactions.

Briefings functional genomics

17, 381–391 (2018).

China releases genetic data on new coronavirus, now deadly | CIDRAP.

Chan, J. F.-W. et al.

Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan.

Emerg. microbes & infections

9, 221–236 (2020).

Pfefferle, S. et al.

The sars-coronavirus-host interactome: identification of cyclophilins as target for pan-coronavirus inhibitors.

PLoS pathogens

Von Brunn, A. et al.

Analysis of intraviral protein-protein interactions of the sars coronavirus orfeome.

PloS one

Fung, T. S. & Liu, D. X. Human coronavirus: Host-pathogen interaction.

Annu. review microbiology

73, 529–557 (2019).

Saha, S., Chatterjee, P., Basu, S. & Nasipuri, M. Detection of spreader nodes and ranking of interacting edges in human-sars-cov protein interaction network. bioRxiv (2020).

Uniprot: the universal protein knowledgebase.

Nucleic acids research

45, D158–D169 (2017).

Consortium, G. O. The gene ontology (go) database and informatics resource.

Nucleic acids research

32, D258–D261 (2004).

Dutta, P., Basu, S. & Kundu, M. Assessment of semantic similarity between proteins using information content and topological properties of the gene ontology graph.

IEEE/ACM transactions on computational biology bioinformatics

15, 839–849 (2017).

Bailey, N. T. et al.

The mathematical theory of infectious diseases and its applications (Charles Griffin & Company Ltd, 5a Crendon Street, High Wycombe, Bucks HP13 6LE., 1975).

Lucy Chin, Jordan Cox, Safiya Esmail, Mark Franklin, D. L. COVID-19 : Finding the Right Fit Identifying Potential Treatments Using a Data-Driven Approach.

Drugbank

White Pap. (2020).

Agrawal, M., Zitnik, M., Leskovec, J. et al.

Large-scale analysis of disease pathways in the human interactome. In PSB, 111–122 (World Scientific, 2018).

BioSNAP: Network datasets: Human protein-protein interaction network.

Botstein, D. et al.

Gene ontology: tool for the unification of biology.

Nat genet

25, 25–29 (2000).

Harrison, C. Coronavirus puts drug repurposing on the fast track. Nat. biotechnology 38, 379–381 (2020).

Cao, B. et al.

A trial of lopinavir–ritonavir in adults hospitalized with severe covid-19.

New Engl. J. Medicine (2020).

Food, Administration, D. et al.

Gautret, P. et al.

Hydroxychloroquine and azithromycin as a treatment of covid-19: results of an open-label non-randomized clinical trial.

Int. journal antimicrobial agents

De Wit, E. et al.

Prophylactic and therapeutic remdesivir (gs-5734) treatment in the rhesus macaque model of mers-cov infection.

Proc. Natl. Acad. Sci.

Emergency Access to Remdesivir Outside of Clinical Trials.

Remdesivir Clinical Trials.

China approves antiviral favilavir to treat coronavirus - UPI.com.

Taiwan synthesizes anti-viral drug favilavir for COVID-19 patients - Focus Taiwan.

Efficacy and Safety of Darunavir and Cobicistat for Treatment of COVID-19 - Full Text View - ClinicalTrials.gov.

Wishart, D. S. et al.

Drugbank: a knowledgebase for drugs, drug actions and drug targets.

Nucleic acids research

36, D901–D906 (2008).

Advanced Search - DrugBank.

DrugBank.

Mourad, J.-J. & Levy, B. I. Interaction between raas inhibitors and ace2 in the context of covid-19.

Nat. Rev. Cardiol.

ACE-2 is shown to be the entry receptor for SARS-CoV-2: R&D Systems.

Patel, A. B. & Verma, A. Covid-19 and angiotensin-converting enzyme inhibitors and angiotensin receptor blockers: what is the evidence?

Jama (2020).

Trial shows Covid-19 patients recover with Gilead’s remdesivir.

Clinical Trial of Favipiravir Tablets Combine With Chloroquine Phosphate in the Treatment of Novel Coronavirus Pneumonia - Full Text View - ClinicalTrials.gov.

Wang, S. & Wu, F. Detecting overlapping protein complexes in ppi networks based on robustness.

Proteome science

11, S18 (2013).

Samadi, N. & Bouyer, A. Identifying influential spreaders based on edge ratio and neighborhood diversity measures in complex networks. Computing 101, 1147–1175 (2019).

Anthonisse, J. The rush in a directed graph, stichting mathematisch centrum, amsterdam (1971).

Sabidussi, G. The centrality index of a graph.

Psychometrika

31, 581–603 (1966).

Jeong, H., Mason, S. P., Barabási, A.-L. & Oltvai, Z. N. Lethality and centrality in protein networks.

Nature

Li, M., Wang, J., Chen, X., Wang, H. & Pan, Y. A local average connectivity-based method for identifying essential proteins from the network level.

Comput. biology chemistry

35, 143–150 (2011).

Couto, F. M., Silva, M. J. & Coutinho, P. M. Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors.

In Proceedings of the 14th ACM international conference on Information and knowledge management , 343–344 (2005).

Couto, F. M., Silva, M. J. & Coutinho, P. M. Measuring semantic similarity between gene ontology terms.

Data & knowledge engineering

61, 137–152 (2007).

Resnik, P. Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007 (1995).

Jain, S. & Bader, G. D. An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology.

BMC bioinformatics

11, 562 (2010).

Lin, D. et al.

An information-theoretic definition of similarity.

In Icml, vol. 98, 296–304 (1998).

Jiang, J. J. & Conrath, D. W. Semantic similarity based on corpus statistics and lexical taxonomy . arXiv preprint cmp-lg/9709008 (1997).

Shannon, C. E. A mathematical theory of communication. acm sigmobile mob.

Comput. Commun. Rev

5, 3–55 (2001).

Acknowledgements

This work is partially supported by the CMATER research laboratory of the Computer Science and Engineering Department, Jadavpur University, India, PURSE-II and UPE-II grants. Subhadip Basu acknowledges Department of Biotechnology grant (BT/PR16356/BID/7/596/2016), Government of India. For research fellowship support, Anup Kumar Halder acknowledges the Visvesvaraya PhD Scheme for Electronics & IT, an initiative of Ministry of Electronics & Information Technology (MeitY), Government of India. We acknowledge Prof. Jacek Sroka (Institute of Informatics, University of Warsaw ) for his contribution toward the developments of the fuzzy ppi methods. uthor contributions statement

S.S., A.K.H. and S.B. conceived the idea of the research and wrote the manuscript. S.S. and A.K.H. conducted the experiment(s). P.C, S.S.B., M.N. and S.B. analyzed the results. M.N., P.C and S.B. reviewed the manuscript.

Additional information

Competing interests: