A mixed clustering coefficient centrality for identifying essential proteins
AA mixed clustering coefficient centrality for identifying essentialproteins ∗ Pengli Lu † and JingJuan Yu School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, P.R. China
Abstract
Essential protein plays a crucial role in the process of cell life. The identification ofessential proteins can not only promote the development of drug target technology, but alsocontribute to the mechanism of biological evolution. There are plenty of scholars who payattention to discovering essential proteins according to the topological structure of proteinnetwork and biological information. The accuracy of protein recognition still demands to beimproved. In this paper, we propose a method which integrate the clustering coefficient inprotein complexes and topological properties to determine the essentiality of proteins. First,we give the definition of In-clustering coefficient ( IC ) to describe the properties of proteincomplexes. Then we propose a new method, complex edge and node clustering coefficient( CEN C ) to identify essential proteins. Different Protein-Protein Interaction (PPI) networksof Saccharomyces cerevisiae, MIPS and DIP are used as experimental materials. Throughsome experiments of logistic regression model, the results show that the method of
CEN C can promote the ability of recognizing essential proteins, by comparing with the existingmethods DC , BC , EC , SC , LAC , N C and the recent method
U C . Keywords:
Protein interaction network; Essential protein; Protein complex; Assessmentmethod
Protein is a crucial component of all cells and organizations. It is considered as essentialproteins that the proteins necessary to maintain the life of the organism. Not only can essentialproteins promote the development of drug target technology, but also help the study of biologicalevolution mechanism [1]. Removing the essential proteins can lead to cell death or inability toreplicate and reproduce [2]. The recognition and protection of essential proteins are the basisof drug development, which provide valuable theories and methods for the diagnosis of diseases,drug design, etc. [3].In biology, the identification methods of essential proteins mainly rely on biological exper-iments, such as conditional knockouts [4], RNA interference [5], and single gene knockouts [6],coupled with the survival ability of infected organisms being tested. These biological experi-mental results are clear and effective, but consume amounts of time, costs and resources. With ∗ Supported by the National Natural Science Foundation of China (No.11361033) and the Natural ScienceFoundation of Gansu Province (No.1212RJZA029). † Corresponding author. E-mail addresses: [email protected] (
P. Lu ), [email protected] (
J. Yu ). a r X i v : . [ q - b i o . M N ] M a r he improvement of prior technology, several protein protein interaction(PPI) networks are gen-erated [7, 8]. Nowadays, it has been a crucial research direction in the field of bioinformatics forpredicting essential proteins from a large number of biological experiments by using the theoryof technology from PPI networks. The methods for identifying essential proteins can be dividedinto several categories.Based on the centrality-lethality rule which put forward by Jeong H M et al., the essentialityof proteins is associated with the topological structure in PPI networks [9]. Thus, a largenumber of scholars have proposed many indicators based on topological centrality [12,16]. Someof them considered the topological of nodes in networks, such as degree centrality ( DC ) whichconsiders the connection nodes [12, 17, 18], betweenness centrality ( BC ) which considers theglobal characteristic [13, 46], subgraph centrality ( SC ) [19], local average centrality ( LAC ) [20],eigenvector centrality ( EC ) [21], information centrality ( IC ) [27] and closeness centrality ( CC )[15], and topology potential-based method ( T P ) [42]. Some of them considered the topologicalof edges in networks, including edge clustering coefficient (
ECC ) [43], improved node and edgeclustering coefficient (
IN EC ) [44], integrated edge weights (
IEC ) [24] and network centrality(
N C ) [25]. CytoNCA, an app of Cytoscape for analyzing the centrality methods, have been avaluable tool to identify the essentiality of proteins [26].With the increase of high-throughput biological data, scholars have tried to combine withbiology information to improve the accuracy of identifying essential proteins. Considering thefunctional annotations of genes, a weighted protein-protein network is constructed, by combiningedge clustering coefficients with gene expression data correlation coefficients, a method of
P eC is proposed [32]. There is an esP OS method that uses the information of gene expression andsubcellular localization [29].
SP P method is based on sub-network division and sequencing byintegrating subcellular positioning [14]. Extended pareto optimality consensus model (
EP OC )mixes neighborhood closeness centrality and Orthology information together [28]. Go termsinformation is also used to predict essential proteins such as
RSG method [33].There are some studies who recognize essential proteins from the perspective of complexesand functional modules. Hart G T et al. find that the essentiality is an attribute of the proteincomplex and the protein complexes often determine the essential proteins [30]. Li et al. provethat the frequency of the essential protein that occurs in the complex is higher than in the wholenetwork [29, 48]. Luo J W et al. propose a method of (
LIDC ), combining the local interactiondensity and protein complexes for predicting essential proteins [45]. Li et al. propose unitedcomplex centrality (
U C ) which takes into account the frequency of protein appeared in thecomplex and edge properties [31].In this paper, considering the protein complexes information and topological properties, anew method of complex edge and node clustering coefficient (
CEN C ) is proposed to identifyessential proteins. To assess the quality of
CEN C method, different datasets of Saccharomyescerevisiae, MIPS and DIP are applied. By the comparison of seven existing methods, containing DC , BC , EC , SC , LAC , N C and
U C , experiment results show that our method can be moreeffective in determining the essentiality of proteins than existing measures.2
New Centrality:
C EN C
An undirected simple graph G ( V, E ) can be used to express a network of protein interaction.Proteins can be regarded as nodes set V of a network and the connections between two proteinscan be regarded as edges set E . In this study, we present a new method of complex edge andnode clustering coefficient CEN C to judge the essentiality of proteins by combining the featureof protein complex and topology of nodes and edges. The basic considerations of
CEN C areas follows: (1) The essential proteins appear in complexes can be more frequency. (2) Both thetopology of node and edge are important factors to affect the essentiality of proteins.First, we present a classical method of clustering coefficient ( C ) [22]. C ( v ) = 2 E v k v ( k v −
1) (2.1)where E v is the actual number of edges shared with local neighbors of node v , k v is the degreeof node v .Then, a clustering coefficient of a node to an edge was generalized by Radicchi et al. [23].The edge clustering coefficient ( ECC ) is defined as [43].
ECC v,u = z v,u min ( k v − , k u −
1) (2.2)where z v,u is the number of triangles that includes the edge e ( v, u ) in network. k v and k u arethe degrees of node u and node v , respectively.Based on the numbers of connection edges for a node and the clustering coefficient of eachedge, the sum of edge clustering coefficients N C is proposed [25].
N C ( v ) = (cid:88) u ∈ N v ECC ( v, u ) (2.3)where N v denotes the set of all neighbors of node v .Further more, we propose a new definition In-clustering coefficient ( IC ) which combine thefeature in complexes. IC ( v ) = (cid:88) i ∈ ComplexSet ( v ) C ( v ) i (2.4)A subset of protein complexes that containing protein v can be represented as ComplexSet ( v ),the value of C ( v ) for the i th protein complex which belongs to ComplexSet ( v ) can be representedas C ( v ) i .Now, based on the definition that we described above, we propose our new method complexedge and node clustering coefficient ( CEN C ) for estimating the essentiality of proteins.
CEN C ( v ) = a ∗ IC ( v ) IC max + b ∗ N C ( v ) N C max + c ∗ C ( v ) C max (2.5)where a , b , c are random factors ranging from 1 to 10. Under the amounts of experiments, wecan get the best result of the method CEN C when a , b and c are 10, 1 and 1, respectively.3 Experimental data and assessment methods
The experiment data are conducted from Saccharomyes cerevisiae, whose proteins are morecomplete. Two sets of PPI network data MIPS [35] and DIP [34] are used. In the protein network,all self-interactions and repetitive interactions are deleted as a data preprocessing of these PPIs.Specific properties for these two networks are presented in the Table 1. The MIPS networkincludes 4546 proteins and 12319 interactions, whose clustering coefficient is about 0.0879. Inthe DIP network, there are 5093 proteins and 24743 interactions, whose clustering coefficient isabout 0.0973. The known essential proteins are derived from four databases: MIPS [47], SGD(Saccharomyces Genome Database) [41], SGDP (Saccharomyces Genome Deletion Project) [4],and DEG (Database of Essential Genes) [35]. The protein complex set is from CM270 [47],CM425 [37], CYC408 and CYC428 datasets [38, 39] which can gain from [29], containing 745protein complexes (including 2167 proteins).Table 1: Data details of the two protein networks: DIP, MIPSDataset Proteins Interactions Average degree Essential proteins Clustering coefficientMIPS 4546 12319 5.42 1016 0.0879DIP 5093 24743 9.72 1167 0.0973
According to the values of
CEN C , proteins are sorted in descending orders. First, somenumbers of top proteins in sequence are selected as predictive essential proteins, then comparethem with the real essential proteins. This allows us to know the quantity of true essentialproteins. Therefore, the sensitivity ( SN ), specificity ( SP ), F-measure ( F ), and accuracy ( ACC ),positive predictive value (
P P V ), negative predictive value (
N P V ) can be calculated [28,29].The following are the formulas for calculating these six statistical indicators.Sensitivity: SN = T PT P + F N
Specificity: SP = T NT N + F P
Positive predictive value:
P P V = T PT P + F P
Negative predictive value:
N P V = T NT N + F N
F-measure: F = 2 ∗ SN ∗ P P VSN + P P V
Accuracy:
ACC = T P + T NP + N T P stands for the quantity of true essential proteins which are correctly selected asessential proteins.
F P is the quantity of nonessential proteins which are incorrectly selected asessential.
T N is the quantity of nonessential proteins which are correctly selected as nonessential.
F N is the quantity of essential proteins which are incorrectly selected as nonessential. P and N stand for the sum number of essential and nonessential proteins, respectively. Fig. 1: The quantity of true essential proteins determined by
CEN C and other seven previouslymethods from the MIPS network.We follow the principle of “sorting-screening” to evaluate the performance of
CEN C . Com-parisons of
CEN C method with other seven previous measures: degree centrality ( DC ) [12],betweenness centrality ( BC ) [13, 46], eigenvector centrality ( EC ) [21], subgraph centrality ( SC )[19], local average centrality ( LAC ) [20], network centrality (
N C ) [25], united complex centrality(
U C ) [31] are carried out in the MIPS and DIP datasets. To be specific, proteins are sortedin descending order on the basis of their values of
CEN C and other seven previous measures.Then, predictive essential proteins are chosen according to the top 100, 200, 300, 400, 500, and600 proteins. Finally, by comparing with the known essential proteins, the quantity of trueessential proteins among these predictive essential proteins can be obtained. The experimentalresults of these measures are shown in Figs. 1-2.From Fig. 1, the quantity of true essential proteins judged by
CEN C are 67, 124, 171,209, 243 and 260 from the top 100 to the top 600, respectively, being the best among the eightmethods in MIPS network. Although the method of
U C has good performance in the yeastPPI network, it is still poor in MIPS network. Among these seven proposed methods, SC is thelowest indicator of recognition essential proteins. Compared to the method of SC , our method5ig. 2: The quantity of true essential proteins determined by CEN C and other seven previouslymethods from the DIP network.
CEN C improves the rate of 86.56%, 86.29%, 83.04%, 77.03%, 74.49%, 68.46% in the top 100to top 600, respectively. When choose the best performance for each top, the
CEN C methodcan still obtain 53.73%, 52.42%, 51.46%, 46.41%, 38.68% and 30% improvements in predictingessential proteins.From Fig. 2, it can be seen that the
CEN C method performs better than existing methodsof DC , BC , EC , SC , LAC , N C and
U C in DIP network. Compared with the best result amongthese seven methods, the true essential proteins determined by
CEN C method are increased by4, 16, 18, 16, 16 and 29 from the top 100 to the top 600, respectively. Moreover, the quantity ofessential proteins are much more than previous method including DC , BC , SC and EC . The six statistical methods are used to evaluate the indicator of
CEN C as well as otherseven identification measures, mentioned in the Assessment methods Section. Proteins are sortedfrom high to low order on the basis of their values of these methods; Then, the top proteinsof 20 percent are taken into account as predictive essential proteins, the remaining 80 percentcan be considered as candidates for nonessential proteins. On the two different networks, thecomparisons among the values of
CEN C and other seven measures are executed, showing inTable 2. For DIP network, these six statistic values for
CEN C are higher than other previousmeasures, which show that
CEN C has a better prediction accuracy. For MIPS network, thesevalues of SN , SP , P P V , N P V , F − measure and ACC determined by
CEN C are 0.317, 0.827,0.368, 0.792, 0.341 and 0.704, respectively, being higher than previous proposed methods DC , BC , EC , SC , LAC , N C and
U C . These results indicate that
CEN C method has a betterperformance than the existing seven methods.In addition, the Precision-Recall curve, a statistical method for evaluating stability, can be6able 2:
Comparison the results of sensitivity( SN ), specificity( SP ), F-measure( F ), positive predictivevalue( P P V ), negative predictive value(
N P V ), and accuracy(
ACC ) of
CEN C and other seven previousalgorithms.
Dataset Methods SN SP PPV NPV F-measure ACCMIPS DC 0.254 0.803 0.291 0.772 0.271 0.671BC 0.197 0.796 0.278 0.716 0.231 0.629EC 0.139 0.773 0.163 0.738 0.150 0.620SC 0.138 0.773 0.162 0.739 0.149 0.620LAC 0.271 0.812 0.314 0.779 0.291 0.682NC 0.281 0.814 0.325 0 .
781 0 .
302 0.686UC 0.271 0.812 0.314 0.778 0.291 0.682CENC .
317 0 .
827 0 .
368 0 .
792 0 .
341 0 . DIP DC 0.353 0.834 0.409 0.80 0.379 0.716BC 0.308 0.823 0.361 0.785 0.333 0.70EC 0.323 0.824 0.374 0.789 0.347 0.701SC 0.316 0.822 0.366 0.787 0.339 0.698LAC 0.405 0.852 0.472 0.815 0.436 0.743NC 0.40 0.850 0.463 0.813 0.428 0.739UC 0.391 0.850 0.458 0.811 0.422 0.737CENC .
422 0 .
858 0 .
491 0 .
820 0 .
454 0 . Fig. 3: Precision and recall curves of
CEN C and other seven methods for MIPS network.7ig. 4: Precision and recall curves of
CEN C and other seven methods for DIP network.used for
CEN C method and other previous seven measures which defined as follows:
P recision ( n ) = T P ( n ) T P ( n ) + F P ( n ) Recall ( n ) = T P ( n ) T P ( n ) + F N ( n )where the definitions of T P , F P , F N are depicted in the Assessment method Section. Theresults are revealed in Fig. 3 and Fig. 4. In DIP network, our method of
CEN C has a betterperformance than the other methods. The same results are shown in MIPS network.
ROC ) curve and
AU C
The Receiver Operating Characteristic (
ROC ) is a valuable tool to measure the imbalancein classification [49]. It is used to evaluate the pros and cons of a binary classifier. Predictingessential proteins can be regard as a two-classification case. Their definitions are as follows:
T P R ( n ) = T P ( n ) T P ( n ) + F N ( n ) F P R ( n ) = F P ( n ) F P ( n ) + T N ( n )where the meanings of T P , F P , F N and
T N are described in the Assessment method Section.As shown in Figs. 5-6, The
ROC curve of
CEN C is slightly higher than that of the other sevenmethods, indicating the method of
CEN C is more effective.To further reveal the experimental results of the
ROC curves, the area under the
ROC curves is used to quantitatively analysis the results, generally called
AU C . The
AU C resultsare shown in Table 3. The values of
CEN C method are much more than the previous existingmethods. 8ig. 5:
ROC curves of
CEN C and the other seven methods for the MIPS network.Fig. 6:
ROC curves of
CEN C and the other seven methods for the DIP network.9able 3:
AUC values of
CENC and other seven methods in MIPS and DIP networks
Methods DC BC EC SC LAC NC UC CENCMIPS 0.289 0.277 0.225 0.277 0.289 0.289 0.289 0.300DIP 0.327 0.307 0.312 0.340 0.340 0.339 0.331 0.340
The jackknife methodology was developed by Holman et al., being an effective universal pre-diction method [30]. The X-axis represents the quantity of selected predictive essential proteinsafter sequencing, and the Y-axis represents the quantity of true essential proteins in the selectedproteins. First, according to the predicted value, proteins are sorted in descending order. Thenwe choose predictive essential proteins from top 0 to top 800 in each dataset. Last, the jackknifecurve is drawn based on the accumulated quantity of real essential proteins. From Fig. 7 andFig. 8, we can see that the prediction efficiency for
CEN C method is higher than that of otherseven centrality measures on the MIPS and DIP networks. Consequently, the jackknife curvesreveal that our method
CEN C is an effective approach for predicting essential proteins.Fig. 7: The performances of
CEN C and other seven centrality measures on the DIP networkare evaluated by a jackknife methodology.
Essential proteins are crucial for the survival and normal functioning of all organisms. Im-proving the recognition accuracy of essential proteins is a challenging task. Plenty of scholarsdevoted themselves to identify essential proteins in terms of the topological features for thewhole network, ignoring the importance of complex and biological information. In this paper,on the basis of the mixed clustering coefficient for complexes and edge topology, a new method
CEN C is proposed. Then two different datasets of MIPS and DIP are applied. The evaluationmethods include “sorting-screening” method, six statistical method, the precision-recall curves,
ROC curve,
AU C and jackknife method. Then we compare
CEN C and other seven proposedmethods, containing DC , BC , EC , SC , LAC , N C and
U C by using these evaluation methods.10ig. 8: The performances of
CEN C and other seven centrality measures on the MIPS networkare evaluated by a jackknife methodology.It is found that our proposed method of
CEN C has the ability to improve the accuracy inpredicting essential proteins.
References [1] Fraser H B, Hirsh A E, et al., Evolutionary Rate in the Protein Interaction Network, Science, 296(5568):750-752, 2002.[2] Xu B, Guan J, Wang Y, et al., Essential protein detection by random walk on weighted protein-proteininteraction networks, IEEE/ACM Trans Comput Biol Bioinform, PP(99):1-1, 2017.[3] Winzeler E A, Shoemaker D D, Astromoff A, Liang H, Anderson K, Andre B, et al., Functional charac-terization of the s. cerevisiae genome by gene deletion and parallel analysis, Science, 285 (5429):901-906,1999.[4] Wang Y, Sun H, Du W, Blanzieri E, Viero G, Xu Y, et al., Identification of essential proteins based onranking edge-weights in protein-protein interaction networks, PloS One, 9(9):e108716, 2014.[5] Roemer T, Jiang B, Davison J, Ketela T, Veillette K, et al., Large-scale essential gene identification inCandida albicans and applications to antifungal drug discovery, Mol Microbiol, 50:167-181, 2003.[6] Cullen L M, Arndt G M, Genome-wide screening for gene function using RNAi in mammalian cells, ImmunolCell Biol, 83:217-223, 2005.[7] Estrada E, Virtual identification of essential proteins within the protein interaction network of yeast,Proteomics, 6(1):35-40, 2006.[8] Peng W, Wang J, Wang W, et al., Iteration method for predicting essential proteins based on orthology andprotein-protein interaction networks, BMC Systems Biology, 6(1),2012.[9] Giaever G, Chu A M, Ni L, et al., SGD: Functional profiling of the saccharomyces cerevisiae genome, Nature,418(6896):387-391, 2002.[10] Jeong H M, Mason S P, Albert B, et al., Lethality and centrality in protein networks, Nature, 411:41-42,2001.[11] Zhao B H, Wang J X, Li M, et al., Prediction of Essential Proteins Based on Overlapping Essential Modules,IEEE Transactions on Nanobioscience, 13(4):415-424, 2014.
12] Hahn M W, Kern A D, Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks, Molecular Biology and Evolution, 22(4):803-806, 2005.[13] Freeman L C, A set of measures of centrality based on betweenness, Sociometry, 40(1):35-41, 1977.[14] Li M , Li W , Wu F X , et al., Identifying essential proteins based on sub-network partition and prioritizationby integrating subcellular localization information, Journal of Theoretical Biology, 2018.[15] Wuchty S, Stadler P F, Centers of complex networks, Journal of Theoretical Biology, 223(1):45-53, 2003.[16] Batada N N, Hurst L D, Tyers M, Evolutionary and Physiological Importance of Hub Proteins, PLoSComputational Biology, 2(7):e88, 2006.[17] Lin C C, Juan H F, Hsiang J T, Hwang Y C, Mori H, Huang H C, Essential core of protein-protein interactionnetwork in escherichia coli, Journal of Proteome Research, 8(4):1925-1931, 2009.[18] Liang H, Li W H, Gene essentiality, gene duplicability and protein connectivity in human and mouse, Trendsin Genetics, 23(8):375-378, 2007.[19] Estrada E, Juan A, Subgraph centrality in complex networks, Physical Review E, 71(5):1-9, 2005.[20] Li M, Wang J, Chen X, et al., A local average connectivity-based method for identifying essential proteinsfrom the network level, Computational Biology and Chemistry, 35(3):143-150, 2011.[21] Bonacich P, Power and centrality: a family of measures, American Journal of Sociology, 92(5):1170-1182,1987.[22] Nie T , Guo Z , Zhao K , et al., Using mapping entropy to identify node centrality in complex networksPhysica A, Statistical Mechanics and its Applications, 453:290-297, 2016.[23] Radicchi F, Castellano C, Cecconi F, et al., Defining and identifying communities in networks, Proceedingsof the National Academy of Sciences of the United States of America, 101(9):2658-2663, 2003.[24] Jiang Y , Wang Y , Pang W , et al., CytoNCA: Essential Protein Identification Based on Essential Protein-Protein Interaction Prediction by Integrated Edge Weights, The IEEE International Conference on Bioin-formatics and Biomedicine. IEEE, 2014.[25] Wang J , Li M , Wang H , et al., Identification of Essential Proteins Based on Edge Clustering Coefficient,IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4):1070—1080, 2012.[26] Tang Y , Li M , Wang J , et al., CytoNCA: A cytoscape plugin for centrality analysis and evaluation ofprotein interaction networks, Biosystems, 127:67-72, 2015.[27] Stephenson K, Zelen M, Rethinking centrality: methods and examples, Soc Networks, 11:1-37, 1989.[28] Li G , Li M , Wang J , et al., United neighborhood closeness centrality and orthology for predicting essentialproteins, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2018.[29] Zhang Z P, Ruan J S, Gao J Z, et al., Predicting essential proteins from protein-protein interactions usingorder statistics, Journal of Theoretical Bioligy, 480:274-283, 2019.[30] Hart G T, Lee I, Marcotte E M. A high-accuracy consensus map of yeast protein complexes reveals modularnature of gene essentiality. Bmc Bioinformatics, 8(1):236-0, 2007.[31] Li M, Lu Y, Niu Z, et al., United complex centrality for identification of essential proteins from PPI networks,IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14(2):370-380, 2017.[32] Li M, Zhang H H, Fei Y P, Essential protein discovery method based on integration of PPI and geneexpression data, Journal of Central South University, 44(3):1024-1029, 2013.[33] Lei X , Zhao J , et al., Predicting essential proteins based on RNA-Seq, subcellular localization and GOannotation datasets, Knowledge-Based Systems, 2018.[34] Xenarios I, Lukasz S, et al., DIP, the database of interacting proteins: a research tool for studying cellularnetworks of protein interactions, Nucleic Acids Research, 30(1):303-305, 2002.
35] Zhang R, Lin Y, DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes, Nucleic AcidsRes, 37(suppl 1):D455-D458, 2009.[36] Wang J, Li M, Wang H, Pan Y, Identification of essential proteins based on edge clustering coefficient,Transactions on Computational Biology and Bioinformatics, 9(4):1070-1080, 2012.[37] Friedel C C, Krumsiek J, Zimmer R, International Conference on Research in Computational MolecularBiology, Springer-Verlag, 2008.[38] Pu S, Wong J, Turner B, Cho E, Wodak S J, Up-to-date catalogues of yeast protein complexes, NucleicAcids Research, 37(3):825-831, 2009.[39] Pu S, Vlasblom J, Emili A, et al., Identifying functional modules in the physical interactome of saccharomycescerevisiae, Proteomics, 7(6):944-960, 2010.[40] Holman A G, Davis P J, Foster J M, et al., Computational prediction of essential genes in an unculturableendosymbiotic bacterium, wolbachia of brugia malayi, Bmc Microbiology, 9(1):1-14, 2009.[41] Cherry J M, Adler C, Ball C A, et al., SGD: saccharomyces genome database, Nucleic Acids Research,26(1):73-79, 1998.[42] Li M , Lu Y , Wang J , et al., A Topology Potential-Based Method for Identifying Essential Proteins fromPPI Networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(2):372-383,2015.[43] Radicchi F , Castellano C , Cecconi F, et al., Defining and identifying communities in networks, Proceedingsof the National Academy of Sciences of the United States of America, 101(9):2658-2663, 2003.[44] Zhu Y, Wu C, Identification of essential proteins using improved node and edge clustering coefficient,Proceedings of the 37th Chinese Control Conference, 2018.[45] Luo J W, Qi Y, Identification of essential proteins based on a new combination of local interaction densityand protein complexes, PLOS ONE, 10(6):e0131418, 2015.[46] Joy M P, Brock A, Ingber D E, et al., High-betweenness proteins in the yeast protein interaction network,Journal of Biomedicine and Biotechnology, 2005(2):96, 2014.[47] Mewes H W, Amid C, Arnold R, et al., MIPS: analysis and annotation of proteins from whole genomes,Nucleic Acids Research, 34(Database issue):169-72, 2004.[48] Pereira-Leal J B, Benjamin A , Peregrin-Alvarez J M, et al., An Exponential Core in the Heart of the YeastProtein Interaction Network, Molecular Biology and Evolution, 2015.[49] P Bradley, The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms,Pattern Recognition, 30:1145-1159, 1996.35] Zhang R, Lin Y, DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes, Nucleic AcidsRes, 37(suppl 1):D455-D458, 2009.[36] Wang J, Li M, Wang H, Pan Y, Identification of essential proteins based on edge clustering coefficient,Transactions on Computational Biology and Bioinformatics, 9(4):1070-1080, 2012.[37] Friedel C C, Krumsiek J, Zimmer R, International Conference on Research in Computational MolecularBiology, Springer-Verlag, 2008.[38] Pu S, Wong J, Turner B, Cho E, Wodak S J, Up-to-date catalogues of yeast protein complexes, NucleicAcids Research, 37(3):825-831, 2009.[39] Pu S, Vlasblom J, Emili A, et al., Identifying functional modules in the physical interactome of saccharomycescerevisiae, Proteomics, 7(6):944-960, 2010.[40] Holman A G, Davis P J, Foster J M, et al., Computational prediction of essential genes in an unculturableendosymbiotic bacterium, wolbachia of brugia malayi, Bmc Microbiology, 9(1):1-14, 2009.[41] Cherry J M, Adler C, Ball C A, et al., SGD: saccharomyces genome database, Nucleic Acids Research,26(1):73-79, 1998.[42] Li M , Lu Y , Wang J , et al., A Topology Potential-Based Method for Identifying Essential Proteins fromPPI Networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(2):372-383,2015.[43] Radicchi F , Castellano C , Cecconi F, et al., Defining and identifying communities in networks, Proceedingsof the National Academy of Sciences of the United States of America, 101(9):2658-2663, 2003.[44] Zhu Y, Wu C, Identification of essential proteins using improved node and edge clustering coefficient,Proceedings of the 37th Chinese Control Conference, 2018.[45] Luo J W, Qi Y, Identification of essential proteins based on a new combination of local interaction densityand protein complexes, PLOS ONE, 10(6):e0131418, 2015.[46] Joy M P, Brock A, Ingber D E, et al., High-betweenness proteins in the yeast protein interaction network,Journal of Biomedicine and Biotechnology, 2005(2):96, 2014.[47] Mewes H W, Amid C, Arnold R, et al., MIPS: analysis and annotation of proteins from whole genomes,Nucleic Acids Research, 34(Database issue):169-72, 2004.[48] Pereira-Leal J B, Benjamin A , Peregrin-Alvarez J M, et al., An Exponential Core in the Heart of the YeastProtein Interaction Network, Molecular Biology and Evolution, 2015.[49] P Bradley, The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms,Pattern Recognition, 30:1145-1159, 1996.