Analyzing Host-Viral Interactome of SARS-CoV-2 for Identifying Vulnerable Host Proteins during COVID-19 Pathogenesis
AAnalyzing Host-Viral Interactome of SARS-CoV-2 for Identifying VulnerableHost Proteins during COVID-19 Pathogenesis
Jayanta Kumar Das a , Swarup Roy b, ∗ , Pietro Hiram Guzzi c, ∗ , a Department of Pediatrics, Johns Hopkins University School of Medicine, Maryland, USA b Network Reconstruction & Analysis (NetRA) Lab, Department of Computer Applications, Sikkim University, Gangtok, India c Department of Medical and Surgical Sciences, Magna Graecia University, Catanzaro, Italy
Abstract
The development of therapeutic targets for COVID-19 treatment is based on the understanding of the molecularmechanism of pathogenesis. The identification of genes and proteins involved in the infection mechanism isthe key to shed out light into the complex molecular mechanisms. The combined effort of many laboratoriesdistributed throughout the world has produced the accumulation of both protein and genetic interactions. In thiswork we integrate these available results and we obtain an host protein-protein interaction network composedby 1432 human proteins. We calculate network centrality measures to identify key proteins. Then we performfunctional enrichment of central proteins. We observed that the identified proteins are mostly associated withseveral crucial pathways, including cellular process, signalling transduction, neurodegenerative disease. Finally,we focused on proteins involved in causing disease in the human respiratory tract. We conclude that COVID-19 is a complex disease, and we highlighted many potential therapeutic targets including RBX1, HSPA5, ITCH,RAB7A, RAB5A, RAB8A, PSMC5, CAPZB, CANX, IGF2R, HSPA1A, which are central and also associatedwith multiple diseases.
Keywords:
SARS-CoV-2, COVID-19, Protein-protein interaction, Centrality, Pathways, Disease
1. Introduction
The world is experiencing an unprecedented pandemic due to a massive outbreak of Severe Acute Respi-ratory Syndrome Corona Virus 2 (SARS-CoV-2 ) infected viral disease, COVID-19. SARS-CoV-2 , is a largeenveloped coronavirus (family-
Coronaviridae , subfamily-
Coronavirinae ) with non-segmented, single-stranded,and positive-sense RNA genomes [1], transmits rapidly through human to human contacts. Although SARS-CoV-2 is similar to other known coronaviruses, i.e. SARS-CoV and MERS-CoV [2, 3], it has demonstrated high ratesof infection [4, 5]. Therefore there is the need to understand the disease pathogenesis of SAR-CoV-2 to developeffective therapies and vaccines.The SARS-CoV-2 virus is responsible COVID-19 disease that causes damages in multiple organs as the diseaseprogresses from an asymptomatic phase to a life-threatening disease [6]. Therefore, accurate molecular diagnosisof COVID-19 disease is essential by collecting the proper respiratory tract specimen [7]. In this context, the inte-grated analysis [8] of various data-sets, including clinical and imaging data, may explain, and hopefully predict,the longitudinal effects of SARS-CoV-2 infection [9, 10]. In particular, many independent projects throughoutthe world have focused on genomics and proteomics level [10], and then they integrated these data with clini-cal ones. These works have produced data about the infection’s effect at a molecular scale, evidencing genesand proteins’ role, such as the interactions among viral and human proteins. Interactions between a host andits pathogen, are primarily driven by interactions among the host proteins and pathogen proteins; also referredto as host-pathogen protein-protein interaction (PPI) network. The SARS-CoV-2 virus-host interactome havebeen studied focusing various virulence factors influencing SARS-CoV-2 pathogenesis and interacting mecha-nism [11, 12, 13, 14, 15, 16]. Further, many recent works also used host-viral protein-protein interaction networkas an input to elucidate potential drug targets or repurposed drug molecules [17, 18, 19]. Host-pathogen proteininteractions provide important insights into the molecular mechanisms of pathogenecity [20] and for understand-ing virulence factors influencing SARS-CoV-2 pathogenesis [21, 22]. SARS-CoV-2 is a newly found virus whoseinteracting human host proteins play a major disease progression role that needs to be investigated. ∗ Corresponding Author
Email addresses: [email protected] (Jayanta Kumar Das), [email protected] (Swarup Roy ), [email protected] (Pietro HiramGuzzi ) a r X i v : . [ q - b i o . B M ] F e b rotein-Protein Interactions (PPI) are usually modelled and analysed with graph theory [23]. In this formalism,the interactions are modelled as a graph whose nodes are proteins (or genes), and the edges are the interactionamong them. Several studies have found that specific candidate proteins might play a crucial role [24, 25, 26, 27].Protein-protein interaction networks are an essential ingredient for any systems-level understanding of cellularprocesses and modelling, and even drug discovery [28, 29, 30, 31, 32]. The key genes/proteins involved in thedifferent biological pathways can give valuable insight for in-depth characterisation of disease progression [33,34, 35]. It is well accepted that all the viruses have evolved to target proteins that are central and have strongcontrol over the human interactome [36, 37, 38, 39, 40]. Exploring the predicted interaction networks can suggestnew directions for future experimental research and provide cross-species predictions for efficient interactionmapping [41, 34]. The complete workflow of the current study can be seen from Figure 1. Figure 1: The complete work-flow of the current study.
This study aims to identify essential human host proteins based on topology analysis of the protein-proteininteraction network of SARS-CoV-2 interacting human host proteins. We performed functional enrichment of theidentified proteins to shed out light on cellular, signalling, and disease pathways.
2. Materials and Method
We use recently reported host proteins that are physically verified using Affinity purification mass spectrometryfor their interactions with SARS-CoV-2 [17, 21, 42]. The used host-viral protein interactions are also available inBioGRID [43]. A total of 2489 host-viral interactions (consisting of 1432 unique host proteins interacting with 37SARS-CoV-2 viral proteins) are obtained. In Figure 2, we provided the number of interacting host protein count.It is noted that the majority of the host proteins are targeted to the specific viral protein.
Starting with the human proteins that are interacting with the virus, we build a host PPI by querying the SearchTool for the Retrieval of Interacting Genes/Proteins (STRING, Version 10.0; http://string-db.org/ ) [44].The topology analysis of the PPI network is performed by using Cytoscape ( http://apps.cytoscape.org ),a general platform for complex network analysis and visualization [45].
In network analysis, indicators of centrality identify the most critical nodes in the network [46]. The cen-trality measure uses to characterise each node and edge in the PPI network. The degree measure is the mostintuitive for topology analysis of the PPI network. Several other crucial factors that can influence network linksare betweenness centrality, closeness centrality, clustering coefficient, topological coefficient, and neighbourhoodconnectivity. 2 a) (b)Figure 2: The abundance (percentage) of collected interacting human host protein for different SARS-CoV-2 viral proteins. A host-viralinteraction network pattern is also shown. (i)
Degree centrality:
The degree centrality (simply degree) of a node n in a network is defined as ( D n ),which indicates number of directly connected nodes to n . The densely connected nodes in PPI network isconsidered hub nodes [47].(ii) Betweenness centrality:
Betweenness centrality quantifies the number of times a node acts as a bridgealong the shortest path between two other nodes [48]. The betweenness centrality of a node n is representedas: C b ( n ) = ∑ s (cid:54) = n (cid:54) = t ( σ st ( n ) / σ st ) (1)where σ st is the total number of shortest paths from node s to node t and σ st ( n ) is the number of those pathsthat pass through n .(iii) Closeness centrality:
Closeness centrality is a way of detecting nodes that are able to spread informationvery efficiently through the network [49]. It can be calculated as : C c ( n ) = / avg ( L ( m , n )) (2)where L ( m , n ) is the length of shortest path between node n and m , and m denotes any other nodes that arereachable to node n .(iv) Average shortest-path length:
Shortest-path length between two nodes (say n and m ) in network topologyis defined as the number of minimum steps that required to traverse between node n and m ![50]. The averageshortest path length ( S p ) of node n is the average value of all pair of nodes shortest path from the node n .(v) Clustering coefficient : Clustering coefficient is a measure of the degree to which nodes in a graph tend tocluster together [51]. In undirected networks, the clustering coefficient C n of a node n is defined as: C n = e n / ( k n ( k n − )) (3)where k n is the number of neighbors of n and e n is the number of connected pairs between all neighbors of n .(vi) Topological coefficient:
Topological coefficient is a relative measure for the extent to which a node sharesneighbors with other nodes [52]. The topological coefficient T n of a node n with k n neighbors is computedas follows: T n = avg ( J ( n , m )) / k n (4)Where J ( n , m ) is defined for all nodes m that share at least one neighbour with n , and the value J ( n , m ) isthe number of neighbours shared between the nodes n and m , plus one if there is a direct link between n and m . 3vii) Neighborhood connectivity:
Neighborhood connectivity ( N c ) of a node n is defined as the average con-nectivity of all neighbors of n [53]. The neighborhood connectivity distribution gives the average of theneighborhood connectivities of all nodes n with k neighbors for k = , , · · · .We used NetworkAnalyzer [45] to calculate above centrality score. In NetworkAnalyzer, C c (Closeness cen-trality) is calculated as the reciprocal of the average shortest path length. So, high C c means highly central, andthus low S p . We performed enrichment analysis to find out set of significantly enriched genes/proteins in different func-tional and biological pathways. We used KEGG (Kyoto Encyclopedia of Genes and Genomes) [54] for elucidat-ing pathway enrichment of a host protein and Gene Ontology (GO) for the assessment of protein functions [55].KEGG is a database resource for understanding high-level functions and utilities of the biological system [56].
Complex diseases are caused by a group of genes known as disease genes. More often, a gene can participate invarious disease conditions [57, 58]. It helps unravel the disease pathogenesis, which in turn help disease diagnosis,treatment, and disease prevention. We obtained gene-disease association network from DisGeNET (v7.0) database( ), which contains 1,134,942 gene-disease associations (GDAs), between 21,671genes and 30,170 disease [59]. From this database, we considered curated gene-disease associations only.
3. Results and Discussion
Here, we report the outcomes of intermediate steps to reach to our objective of isolating key host proteinsfollowed by their significance analysis.
Our candidate host proteins list, collected from the reported host-viral networks (Section 2), consists of total of1432 distinct proteins that are targeted by SARS-CoV-2 during COVID-19 . We rebuilt the PPI network centeredaround our candidate proteins using STRING DB. There are 7076 edges in the derived PPI network. We curatedderived PPI by keeping only the interactions whose confidence scores are at least 0 . We performed topological analysis of the gain component using NetworkAnalyzer [45]. The degree distri-bution of all the candidate proteins in the gain component showed that the majority of the proteins in the gaincomponent exhibit a higher degree of connectivity (Figure 4). Few proteins with degree (shown within parenthe-ses) more than 50 are
CDK1(73), PPP2R1A(65), NOP56(60), POLR2B(60), RAB1A(59), RBX1(58), SKIV2L2(57),NAPA(57), RPS14(56), STX5(54), TGOLN2(54), TCEB1(53), DCTN2(53), TCEB2(52), HSPA9(51), GNB2L1(50) .The histogram analysis of all the centrality measures (discussed in section 2.3) showing non-random dis-tribution (Figure S1). We performed correlation (Pearson) analysis among all centrality scores (Table 1). Thecorrelation score between degree centrality ( D c ) scores and closeness centrality ( C c ) scores observed to be thehighest ( r = . D C and neigh-bourhood centrality ( N c ) is the third-highest ( r = . N c and B c showed less correlative ( r = . D C , B c , C c ) are quite closer. Therefore, weselected them in subsequent analysis. We identified 373 proteins in these criteria, which are considered highlycentral proteins (above the median score for all three selected parameters). When we considered all measures, wefind only six common proteins ( GEMIN4, DDX20, GOLGA3, FKBP15, PMPCA, AK4 ) above the median score ineach category of centrality measurement, and that is the reason why we selected three centrality measures for ourdownstream analysis. 4 igure 3: The gain component of PPI network consisting of 1111 nodes and 7043 edges obtained from whole PPI network.Figure 4: The degree distribution of all 1111 nodes (proteins) in gain component of PPI network. The x-axis indicates degree distribution,whereas y-axis shows relative frequency distributions. able 1: Correlation analysis among all centrality parameters computed for 1111 proteins (Figure 3). D c B c C coe f T c N c B c C coe f T c -0.32 -0.283 0.451 1 N c C c Figure 5: The top 7 enriched pathways in each category of KEGG pathways. In each category, pathways are shown ordered by − log ( p ) value.. We performed KEGG pathway analysis of selected 373 highly central proteins. We obtained a total of 84enriched KEGG pathways within the significant level ( ad j − p < . cellular process (Endocytosis, Phagosome, Ad-herens junction, Tight junction, Cell cycle, Cellular senescence, Focal adhesion, Regulation of actin cytoskeleton,Lysosome), nine pathways in Environmental Information Processing -signalling transduction (Ras signallingpathway, HIF-1 signalling pathway, Hippo signalling pathway, Apelin signalling pathway, MAPK signalling path-way, TGF-beta signalling pathway, AMPK signalling pathway, NF-kappa B signalling pathway), nine pathwaysfrom human disease viral sub-category (Human immunodeficiency virus 1 infection, Human papillomavirus in-fection, Human cytomegalovirus infection, Hepatitis B, Human T-cell leukaemia virus 1 infection, Influenza A,Hepatitis C, Measles) and four pathways from neurodegenerative disease with sub-category (Huntington dis-ease, Parkinson disease, Alzheimer disease, Prion diseases). A total of 141 distinct proteins (out of 373) wereobtained from these pathways, which are then ranked based on presence in selected enriched pathways, and wefound that 79 proteins are associated in our candidate pathways. All these proteins were then further studied for6 igure 6: The top 10 enriched terms in each category of gene ontology (BP-Biological process, MF-Molecular function, CC-Cellular compo-nent). In each category, terms are shown ordered by log ( combinedscore ) value.. disease-gene association in the next.We also performed gene set enrichment analysis (Gene ontology). It is observed that out of selected genesmostly involved in Biological process ( Supplementary-B ). The top ten terms in each category of gene ontology(BP, MF, CC) are shown in Figure 6 that includes neutrophil mediated immunity (GO:0002446), neutrophil acti-vation involved in immune response (GO:0002283) and viral process (GO:0016032) from BP category; dolichyl-diphosphooligosaccharide-protein glycotransferase activity (GO:0004579), GDP binding (GO:0019003), cadherinbinding (GO:0045296) and ATPase activity (GO:0016887) from MF category; and focal adhesion (GO:0005925)from CC category. (a) Gene comparison (b) Disease comparisonFigure 7: Comparison of three groups of disease categories (Cardiovascular, Respiratory, Immune system) using venn-diagram. (a) based onnumber protein count in each category; (b) based on number of disease associated (curated from database) among the observed protein in eachcategory. .4. Analysis of Disease-gene associations The identified 141 genes involved in four significant pathways (cellular process, signalling transduction, viraland neurodegenerative) are further screened by looking into their association with COVID-19 related disease. Weparticularly focused on three highly influential diseases during COVID-19 , namely cardiovascular, respiratorytract [64, 65, 66, 67] and immune system disease [68, 69]. To obtain disease-gene association, we used Dis-GeNET database [59] and selected
CURET ED source only. We found a total of 64 proteins (out of 141) playingroles in various diseases such as
Asthma, Pneumonitis, Pneumonia, Influenza, Lung diseases, Cardiomyopathies,Coronary, Arteriosclerosis, Coronary Artery Disease, Heart failure, HIV Infections etc.(
Supplementary-C ). Wecompared proteins involved in all three disease categories and individual disease in each category (Figure 7). Atotal of 119, 37, and 48 unique diseases, and 44, 17, and 24 distinct proteins are associated with the Cardiovascu-lar, Respiratory, and Immune system disease category, respectively. Interestingly, we found a few proteins that areassociated with all three disease categories ( AREG, CAV1, IFIH1, PARP1, PLAU, TGFB1, ATM, B2M, DDX58,ENO1, HSPA5, PRKDC, STAT6, TGFBR1, TGFBR2). The top few proteins with ten or more disease associ-ations are PLAU(59), TGFB1(29), CAV1(17), PARP1(17), TGFBR2(13), ATP2A2(11), AREG(10), FASN(10),IFIH1(10), ITGB1(10). The list of all 64 proteins and their associated quantitative parameters (degree, diseasecount (out of 204), disease type count (out of 3), and pathway count(out of 31) are presented in Table 2.
We then looked into source network (Figure 2) to identify the viral proteins that are targeting our selected 65disease associated proteins. We found 25 SARS-CoV-2 proteins that are interacting with 65 proteins. Among 25SARS-CoV-2 viral proteins, eight are accessory proteins (Orf3a, Orf7b, Orf6, Orf7a, Orf7b, Orf8, Orf9b, Orf10),four structural proteins (E,M,N,S) and thirteen non-structural poly-proteins (nsp1, nsp10, nsp12, nsp13, nsp14,nsp2, nsp3, nsp4, nsp5, nsp6, nsp7, nsp8, nsp9). It is observed that several host proteins are interacting with singleviral protein. Very few host proteins are interacting with more than one viral proteins. The viral protein Orf7bexhibits the maximum number of target host proteins followed by Orf3a and M protein. Further, five host proteinsare found to be common both in Orf3a and Orf7b.We look further for any other viruses that are targeting our 65 host proteins. We mine VirusMINT [70], avirus-host association database, to find the other related viral diseases. We found that the majority of the high-lighted host proteins are also targeted by
Hepatitis C virus genotype 1b, Poliovirus Type 1, Human herpesvirus1, Human papillomavirus type 16 & 31, Simian virus 40, Sendai virus, Human adenovirus 5 & 12, Epstein-Barrvirus, Human SARS coronavirus Bovine papillomavirus type 1, and
Epstein-Barr virus (Table 2). These proteinsmight be highly essential and need to put uttermost importance on developing host-directed antiviral therapiesfor COVID-19 .
4. Conclusion
In this study, we have analysed human host protein-protein interaction network during the SARS-CoV-2 in-fection. We identified a set of proteins, including RBX1, HSPA5, ITCH, RAB7A, RAB5A, RAB8A, PSMC5,CAPZB, CANX, IGF2R, HSPA1A, which might influence the whole PPI network. These proteins were enrichedfor the following processes: cellular process, signalling, and neurodegenerative disease pathways as these path-ways are known to be highly infectious for disease pathogenesis during COVID-19 . Finally, we have found 64potential/key SARS-CoV-2 interacting human host proteins connected with respiratory, cardiovascular, and im-mune system disease. Many of them are known to target different other viruses and may be highly important fortherapeutics treatment of COVID-19 viral disease. We strongly believe that the highlighted key proteins are anextremely promising target, which might play a crucial role during COVID-19 disease progression.
References [1] D. Wrapp, N. Wang, K. S. Corbett, J. A. Goldsmith, C.-L. Hsieh, O. Abiona, B. S. Graham, J. S. McLellan,Cryo-em structure of the 2019-ncov spike in the prefusion conformation, Science 367 (2020) 1260–1263.[2] S. Perlman, J. Netland, Coronaviruses post-sars: update on replication and pathogenesis, Nature reviewsmicrobiology 7 (2009) 439–450.[3] R. J. de Groot, S. C. Baker, R. S. Baric, C. S. Brown, C. Drosten, L. Enjuanes, R. A. Fouchier, M. Galiano,A. E. Gorbalenya, Z. A. Memish, et al., Commentary: Middle east respiratory syndrome coronavirus (mers-cov): announcement of the coronavirus study group, Journal of virology 87 (2013) 7790–7792.8 igure 8: The interaction network represents the most influential host protein and viral protein. The network is consisting of sixty-four humanproteins interacting with twenty-five SARS-CoV-2 viral proteins. The yellow colour represents the viral protein in the network, whereas thegreen one represents the host protein. [4] Y. Liu, A. A. Gayle, A. Wilder-Smith, J. Rockl¨ov, The reproductive number of covid-19 is higher comparedto sars coronavirus, Journal of travel medicine (2020).[5] V. Surveillances, The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases(covid-19)—china, 2020, China CDC Weekly 2 (2020) 113–122.[6] K. Servick, For survivors of severe covid-19, beating the virus is just the beginning, Science (2020).[7] A. D. Whetton, G. W. Preston, S. Abubeker, N. Geifman, Proteomics and informatics for understandingphases and identifying biomarkers in covid-19 disease, Journal of proteome research 19 (2020) 4219–4232.[8] L. Antonelli, M. R. Guarracino, L. Maddalena, M. Sangiovanni, Integrating imaging and omics data: Areview, Biomedical Signal Processing and Control 52 (2019) 264–280.[9] Y.-W. Tang, J. E. Schmitz, D. H. Persing, C. W. Stratton, Laboratory diagnosis of covid-19: current issuesand challenges, Journal of clinical microbiology 58 (2020).[10] J. K. Das, G. Tradigo, P. Veltri, P. H. Guzzi, S. Roy, Data science in unveiling covid-19 pathogenesis9 able 2: The table presents sixty-four genes/proteins. Each protein is represented with a degree in PPI (Figure 3), disease count (out of 204),disease type count (out of 3), and pathway count (out of 31). Some of the proteins have other known virus target are also reported.
Gene Degree
ADAM17 10 2 Immune 1ALDOA 26 1 Cardiovascular 1AP3B1 17 1 Respiratory 2AREG 20 2 Respiratory, Cardiovascular,Immune 10ATM 29 6 Cardiovascular, Immune 8ATP2A2 10 1 Cardiovascular 11ATP6 19 5 Cardiovascular 1 Human SARS coronavirus, Bovine papillomavirus type 1, Humanpapillomavirus type 16ATR 17 5 Respiratory 2 Human adenovirus 5B2M 25 4 Cardiovascular, Immune 4 Hepatitis C virus genotype 1b (isolate Con1)CANX 31 2 Cardiovascular 1CAPZB 34 1 Cardiovascular 1CAV1 22 2 Respiratory, Cardiovascular,Immune 17 Poliovirus type 1 (strain Sabin)CD44 19 1 Immune 1COX2 9 3 Cardiovascular 1CRKL 11 5 Cardiovascular 4DDX58 10 6 Respiratory, Cardiovascular 2ENO1 13 1 Cardiovascular, Immune 2EPHA2 9 2 Cardiovascular 5FASN 9 1 Cardiovascular 10GAPDH 23 2 Cardiovascular 1 Hepatitis C virus genotype 1b (isolate Con1), Epstein-Barr virus(strain GD1)GLA 16 1 Cardiovascular 5GNAQ 14 5 Cardiovascular 1GUSB 16 1 Immune 1HDAC2 12 5 Respiratory 2 Human herpesvirus 1 (strain 17), Human papillomavirus type 16,Human papillomavirus type 31HLA-A 15 8 Immune 5 Epstein-Barr virus (strain GD1), Human papillomavirus type 16HLA-C 14 8 Immune 7HMGCR 11 1 Immune 4HSPA1A 30 5 Cardiovascular 1 Epstein-Barr virus (strain GD1)HSPA5 46 1 Respiratory, Cardiovascular 3 Epstein-Barr virus (strain GD1)IFIH1 10 3 Respiratory, Cardiovascular,Immune 10 Sendai virus (strain Fushimi)IGF2R 31 2 Respiratory 1ITCH 46 1 Immune 1 Epstein-Barr virus (strain B95-8)ITGA6 10 3 Immune 1ITGB1 29 5 Cardiovascular 10 Hepatitis C virus genotype 1b (isolate Con1)JAK2 22 2 Cardiovascular 6LDHA 14 1 Cardiovascular 4LDLR 20 2 Cardiovascular 4MET 18 4 Respiratory 1NDUFS2 20 3 Cardiovascular 3NF1 16 2 Cardiovascular 3NOTCH1 23 3 Cardiovascular 4 Hepatitis C virus genotype 1b (isolate Con1)NOTCH2 11 2 Cardiovascular 1NOTCH3 15 3 Cardiovascular 2PARP1 14 1 Respiratory, Cardiovascular,Immune 17 Human herpesvirus 1 (strain 17)PCNA 25 3 Immune 1 Human herpesvirus 1 (strain 17)PDIA3 14 3 Cardiovascular 1PLAU 24 1 Respiratory, Cardiovascular,Immune 59PPP1CB 14 4 Cardiovascular 2PRKDC 15 1 Respiratory, Immune 3 Human herpesvirus 1 (strain 17)PSMC5 34 1 Immune 7 Human adenovirus 5, Human adenovirus 12, Simian virus 40PSMD6 24 1 Immune 2PTPN11 22 1 Cardiovascular 6RAB5A 40 3 Cardiovascular 1RAB7A 41 2 Cardiovascular 1RAB8A 40 3 Immune 1RBX1 58 4 Immune 5SERPINE1 16 4 Cardiovascular 8SLC9A1 12 2 Cardiovascular 7SORT1 14 1 Cardiovascular 3STAT6 11 1 Respiratory, Immune 4TGFB1 29 7 Respiratory, Cardiovascular,Immune 29 Hepatitis C virus genotype 1b (isolate Con1)TGFBR1 17 9 Respiratory, Cardiovascular 6TGFBR2 16 8 Respiratory, Cardiovascular 13XPO1 25 2 Cardiovascular 2 and diagnosis: Evolutionary origin to drug repurposing, Briefings in Bioinformatics (2020). URL: https://doi.org/10.1093/bib/bbaa420 . doi:10.1093/bib/bbaa420