[PDF] A mixed clustering coefficient centrality for identifying essential proteins

Abstract

Essential protein plays a crucial role in the process of cell life. The identification of essential proteins can not only promote the development of drug target technology, but also contribute to the mechanism of biological evolution. There are plenty of scholars who pay attention to discovering essential proteins according to the topological structure of protein network and biological information. The accuracy of protein recognition still demands to be improved. In this paper, we propose a method which integrate the clustering coefficient in protein complexes and topological properties to determine the essentiality of proteins. First, we give the definition of In-clustering coefficient (IC) to describe the properties of protein complexes. Then we propose a new method, complex edge and node clustering coefficient (CENC) to identify essential proteins. Different Protein-Protein Interaction (PPI) networks of Saccharomyces cerevisiae, MIPS and DIP are used as experimental materials. Through some experiments of logistic regression model, the results show that the method of CENC can promote the ability of recognizing essential proteins, by comparing with the existing methods DC, BC, EC, SC, LAC, NC and the recent method UC.

Full PDF

AA mixed clustering coeﬃcient centrality for identifying essentialproteins ∗ Pengli Lu † and JingJuan Yu School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, P.R. China

Abstract

Essential protein plays a crucial role in the process of cell life. The identiﬁcation ofessential proteins can not only promote the development of drug target technology, but alsocontribute to the mechanism of biological evolution. There are plenty of scholars who payattention to discovering essential proteins according to the topological structure of proteinnetwork and biological information. The accuracy of protein recognition still demands to beimproved. In this paper, we propose a method which integrate the clustering coeﬃcient inprotein complexes and topological properties to determine the essentiality of proteins. First,we give the deﬁnition of In-clustering coeﬃcient ( IC ) to describe the properties of proteincomplexes. Then we propose a new method, complex edge and node clustering coeﬃcient( CEN C ) to identify essential proteins. Diﬀerent Protein-Protein Interaction (PPI) networksof Saccharomyces cerevisiae, MIPS and DIP are used as experimental materials. Throughsome experiments of logistic regression model, the results show that the method of

CEN C can promote the ability of recognizing essential proteins, by comparing with the existingmethods DC , BC , EC , SC , LAC , N C and the recent method

U C . Keywords:

Protein interaction network; Essential protein; Protein complex; Assessmentmethod

Protein is a crucial component of all cells and organizations. It is considered as essentialproteins that the proteins necessary to maintain the life of the organism. Not only can essentialproteins promote the development of drug target technology, but also help the study of biologicalevolution mechanism [1]. Removing the essential proteins can lead to cell death or inability toreplicate and reproduce [2]. The recognition and protection of essential proteins are the basisof drug development, which provide valuable theories and methods for the diagnosis of diseases,drug design, etc. [3].In biology, the identiﬁcation methods of essential proteins mainly rely on biological exper-iments, such as conditional knockouts [4], RNA interference [5], and single gene knockouts [6],coupled with the survival ability of infected organisms being tested. These biological experi-mental results are clear and eﬀective, but consume amounts of time, costs and resources. With ∗ Supported by the National Natural Science Foundation of China (No.11361033) and the Natural ScienceFoundation of Gansu Province (No.1212RJZA029). † Corresponding author. E-mail addresses: [email protected] (

P. Lu ), [email protected] (

J. Yu ). a r X i v : . [ q - b i o . M N ] M a r he improvement of prior technology, several protein protein interaction(PPI) networks are gen-erated [7, 8]. Nowadays, it has been a crucial research direction in the ﬁeld of bioinformatics forpredicting essential proteins from a large number of biological experiments by using the theoryof technology from PPI networks. The methods for identifying essential proteins can be dividedinto several categories.Based on the centrality-lethality rule which put forward by Jeong H M et al., the essentialityof proteins is associated with the topological structure in PPI networks [9]. Thus, a largenumber of scholars have proposed many indicators based on topological centrality [12,16]. Someof them considered the topological of nodes in networks, such as degree centrality ( DC ) whichconsiders the connection nodes [12, 17, 18], betweenness centrality ( BC ) which considers theglobal characteristic [13, 46], subgraph centrality ( SC ) [19], local average centrality ( LAC ) [20],eigenvector centrality ( EC ) [21], information centrality ( IC ) [27] and closeness centrality ( CC )[15], and topology potential-based method ( T P ) [42]. Some of them considered the topologicalof edges in networks, including edge clustering coeﬃcient (

ECC ) [43], improved node and edgeclustering coeﬃcient (

IN EC ) [44], integrated edge weights (

IEC ) [24] and network centrality(

N C ) [25]. CytoNCA, an app of Cytoscape for analyzing the centrality methods, have been avaluable tool to identify the essentiality of proteins [26].With the increase of high-throughput biological data, scholars have tried to combine withbiology information to improve the accuracy of identifying essential proteins. Considering thefunctional annotations of genes, a weighted protein-protein network is constructed, by combiningedge clustering coeﬃcients with gene expression data correlation coeﬃcients, a method of

P eC is proposed [32]. There is an esP OS method that uses the information of gene expression andsubcellular localization [29].

SP P method is based on sub-network division and sequencing byintegrating subcellular positioning [14]. Extended pareto optimality consensus model (

EP OC )mixes neighborhood closeness centrality and Orthology information together [28]. Go termsinformation is also used to predict essential proteins such as

RSG method [33].There are some studies who recognize essential proteins from the perspective of complexesand functional modules. Hart G T et al. ﬁnd that the essentiality is an attribute of the proteincomplex and the protein complexes often determine the essential proteins [30]. Li et al. provethat the frequency of the essential protein that occurs in the complex is higher than in the wholenetwork [29, 48]. Luo J W et al. propose a method of (

LIDC ), combining the local interactiondensity and protein complexes for predicting essential proteins [45]. Li et al. propose unitedcomplex centrality (

U C ) which takes into account the frequency of protein appeared in thecomplex and edge properties [31].In this paper, considering the protein complexes information and topological properties, anew method of complex edge and node clustering coeﬃcient (

CEN C ) is proposed to identifyessential proteins. To assess the quality of

CEN C method, diﬀerent datasets of Saccharomyescerevisiae, MIPS and DIP are applied. By the comparison of seven existing methods, containing DC , BC , EC , SC , LAC , N C and

U C , experiment results show that our method can be moreeﬀective in determining the essentiality of proteins than existing measures.2

New Centrality:

C EN C

An undirected simple graph G ( V, E ) can be used to express a network of protein interaction.Proteins can be regarded as nodes set V of a network and the connections between two proteinscan be regarded as edges set E . In this study, we present a new method of complex edge andnode clustering coeﬃcient CEN C to judge the essentiality of proteins by combining the featureof protein complex and topology of nodes and edges. The basic considerations of

CEN C areas follows: (1) The essential proteins appear in complexes can be more frequency. (2) Both thetopology of node and edge are important factors to aﬀect the essentiality of proteins.First, we present a classical method of clustering coeﬃcient ( C ) [22]. C ( v ) = 2 E v k v ( k v −

1) (2.1)where E v is the actual number of edges shared with local neighbors of node v , k v is the degreeof node v .Then, a clustering coeﬃcient of a node to an edge was generalized by Radicchi et al. [23].The edge clustering coeﬃcient ( ECC ) is deﬁned as [43].

ECC v,u = z v,u min ( k v − , k u −

1) (2.2)where z v,u is the number of triangles that includes the edge e ( v, u ) in network. k v and k u arethe degrees of node u and node v , respectively.Based on the numbers of connection edges for a node and the clustering coeﬃcient of eachedge, the sum of edge clustering coeﬃcients N C is proposed [25].

N C ( v ) = (cid:88) u ∈ N v ECC ( v, u ) (2.3)where N v denotes the set of all neighbors of node v .Further more, we propose a new deﬁnition In-clustering coeﬃcient ( IC ) which combine thefeature in complexes. IC ( v ) = (cid:88) i ∈ ComplexSet ( v ) C ( v ) i (2.4)A subset of protein complexes that containing protein v can be represented as ComplexSet ( v ),the value of C ( v ) for the i th protein complex which belongs to ComplexSet ( v ) can be representedas C ( v ) i .Now, based on the deﬁnition that we described above, we propose our new method complexedge and node clustering coeﬃcient ( CEN C ) for estimating the essentiality of proteins.

CEN C ( v ) = a ∗ IC ( v ) IC max + b ∗ N C ( v ) N C max + c ∗ C ( v ) C max (2.5)where a , b , c are random factors ranging from 1 to 10. Under the amounts of experiments, wecan get the best result of the method CEN C when a , b and c are 10, 1 and 1, respectively.3 Experimental data and assessment methods

The experiment data are conducted from Saccharomyes cerevisiae, whose proteins are morecomplete. Two sets of PPI network data MIPS [35] and DIP [34] are used. In the protein network,all self-interactions and repetitive interactions are deleted as a data preprocessing of these PPIs.Speciﬁc properties for these two networks are presented in the Table 1. The MIPS networkincludes 4546 proteins and 12319 interactions, whose clustering coeﬃcient is about 0.0879. Inthe DIP network, there are 5093 proteins and 24743 interactions, whose clustering coeﬃcient isabout 0.0973. The known essential proteins are derived from four databases: MIPS [47], SGD(Saccharomyces Genome Database) [41], SGDP (Saccharomyces Genome Deletion Project) [4],and DEG (Database of Essential Genes) [35]. The protein complex set is from CM270 [47],CM425 [37], CYC408 and CYC428 datasets [38, 39] which can gain from [29], containing 745protein complexes (including 2167 proteins).Table 1: Data details of the two protein networks: DIP, MIPSDataset Proteins Interactions Average degree Essential proteins Clustering coeﬃcientMIPS 4546 12319 5.42 1016 0.0879DIP 5093 24743 9.72 1167 0.0973

According to the values of

CEN C , proteins are sorted in descending orders. First, somenumbers of top proteins in sequence are selected as predictive essential proteins, then comparethem with the real essential proteins. This allows us to know the quantity of true essentialproteins. Therefore, the sensitivity ( SN ), speciﬁcity ( SP ), F-measure ( F ), and accuracy ( ACC ),positive predictive value (

P P V ), negative predictive value (

N P V ) can be calculated [28,29].The following are the formulas for calculating these six statistical indicators.Sensitivity: SN = T PT P + F N

Speciﬁcity: SP = T NT N + F P

Positive predictive value:

P P V = T PT P + F P

Negative predictive value:

N P V = T NT N + F N

F-measure: F = 2 ∗ SN ∗ P P VSN + P P V

Accuracy:

ACC = T P + T NP + N T P stands for the quantity of true essential proteins which are correctly selected asessential proteins.

F P is the quantity of nonessential proteins which are incorrectly selected asessential.

T N is the quantity of nonessential proteins which are correctly selected as nonessential.

F N is the quantity of essential proteins which are incorrectly selected as nonessential. P and N stand for the sum number of essential and nonessential proteins, respectively. Fig. 1: The quantity of true essential proteins determined by

CEN C and other seven previouslymethods from the MIPS network.We follow the principle of “sorting-screening” to evaluate the performance of

CEN C . Com-parisons of

CEN C method with other seven previous measures: degree centrality ( DC ) [12],betweenness centrality ( BC ) [13, 46], eigenvector centrality ( EC ) [21], subgraph centrality ( SC )[19], local average centrality ( LAC ) [20], network centrality (

N C ) [25], united complex centrality(

U C ) [31] are carried out in the MIPS and DIP datasets. To be speciﬁc, proteins are sortedin descending order on the basis of their values of

CEN C and other seven previous measures.Then, predictive essential proteins are chosen according to the top 100, 200, 300, 400, 500, and600 proteins. Finally, by comparing with the known essential proteins, the quantity of trueessential proteins among these predictive essential proteins can be obtained. The experimentalresults of these measures are shown in Figs. 1-2.From Fig. 1, the quantity of true essential proteins judged by

CEN C are 67, 124, 171,209, 243 and 260 from the top 100 to the top 600, respectively, being the best among the eightmethods in MIPS network. Although the method of

U C has good performance in the yeastPPI network, it is still poor in MIPS network. Among these seven proposed methods, SC is thelowest indicator of recognition essential proteins. Compared to the method of SC , our method5ig. 2: The quantity of true essential proteins determined by CEN C and other seven previouslymethods from the DIP network.

CEN C improves the rate of 86.56%, 86.29%, 83.04%, 77.03%, 74.49%, 68.46% in the top 100to top 600, respectively. When choose the best performance for each top, the

CEN C methodcan still obtain 53.73%, 52.42%, 51.46%, 46.41%, 38.68% and 30% improvements in predictingessential proteins.From Fig. 2, it can be seen that the

CEN C method performs better than existing methodsof DC , BC , EC , SC , LAC , N C and

U C in DIP network. Compared with the best result amongthese seven methods, the true essential proteins determined by

CEN C method are increased by4, 16, 18, 16, 16 and 29 from the top 100 to the top 600, respectively. Moreover, the quantity ofessential proteins are much more than previous method including DC , BC , SC and EC . The six statistical methods are used to evaluate the indicator of

CEN C as well as otherseven identiﬁcation measures, mentioned in the Assessment methods Section. Proteins are sortedfrom high to low order on the basis of their values of these methods; Then, the top proteinsof 20 percent are taken into account as predictive essential proteins, the remaining 80 percentcan be considered as candidates for nonessential proteins. On the two diﬀerent networks, thecomparisons among the values of

CEN C and other seven measures are executed, showing inTable 2. For DIP network, these six statistic values for

CEN C are higher than other previousmeasures, which show that

CEN C has a better prediction accuracy. For MIPS network, thesevalues of SN , SP , P P V , N P V , F − measure and ACC determined by

CEN C are 0.317, 0.827,0.368, 0.792, 0.341 and 0.704, respectively, being higher than previous proposed methods DC , BC , EC , SC , LAC , N C and

U C . These results indicate that

CEN C method has a betterperformance than the existing seven methods.In addition, the Precision-Recall curve, a statistical method for evaluating stability, can be6able 2:

Comparison the results of sensitivity( SN ), speciﬁcity( SP ), F-measure( F ), positive predictivevalue( P P V ), negative predictive value(

N P V ), and accuracy(

ACC ) of

CEN C and other seven previousalgorithms.

Dataset Methods SN SP PPV NPV F-measure ACCMIPS DC 0.254 0.803 0.291 0.772 0.271 0.671BC 0.197 0.796 0.278 0.716 0.231 0.629EC 0.139 0.773 0.163 0.738 0.150 0.620SC 0.138 0.773 0.162 0.739 0.149 0.620LAC 0.271 0.812 0.314 0.779 0.291 0.682NC 0.281 0.814 0.325 0 .

781 0 .

302 0.686UC 0.271 0.812 0.314 0.778 0.291 0.682CENC .

317 0 .

827 0 .

368 0 .

792 0 .

341 0 . DIP DC 0.353 0.834 0.409 0.80 0.379 0.716BC 0.308 0.823 0.361 0.785 0.333 0.70EC 0.323 0.824 0.374 0.789 0.347 0.701SC 0.316 0.822 0.366 0.787 0.339 0.698LAC 0.405 0.852 0.472 0.815 0.436 0.743NC 0.40 0.850 0.463 0.813 0.428 0.739UC 0.391 0.850 0.458 0.811 0.422 0.737CENC .

422 0 .

858 0 .

491 0 .

820 0 .

454 0 . Fig. 3: Precision and recall curves of

CEN C and other seven methods for MIPS network.7ig. 4: Precision and recall curves of

CEN C and other seven methods for DIP network.used for

CEN C method and other previous seven measures which deﬁned as follows:

P recision ( n ) = T P ( n ) T P ( n ) + F P ( n ) Recall ( n ) = T P ( n ) T P ( n ) + F N ( n )where the deﬁnitions of T P , F P , F N are depicted in the Assessment method Section. Theresults are revealed in Fig. 3 and Fig. 4. In DIP network, our method of

CEN C has a betterperformance than the other methods. The same results are shown in MIPS network.

ROC ) curve and

AU C

The Receiver Operating Characteristic (

ROC ) is a valuable tool to measure the imbalancein classiﬁcation [49]. It is used to evaluate the pros and cons of a binary classiﬁer. Predictingessential proteins can be regard as a two-classiﬁcation case. Their deﬁnitions are as follows:

T P R ( n ) = T P ( n ) T P ( n ) + F N ( n ) F P R ( n ) = F P ( n ) F P ( n ) + T N ( n )where the meanings of T P , F P , F N and

T N are described in the Assessment method Section.As shown in Figs. 5-6, The

ROC curve of

CEN C is slightly higher than that of the other sevenmethods, indicating the method of

CEN C is more eﬀective.To further reveal the experimental results of the

ROC curves, the area under the

ROC curves is used to quantitatively analysis the results, generally called

AU C . The

AU C resultsare shown in Table 3. The values of

CEN C method are much more than the previous existingmethods. 8ig. 5:

ROC curves of

CEN C and the other seven methods for the MIPS network.Fig. 6:

ROC curves of

CEN C and the other seven methods for the DIP network.9able 3:

AUC values of

CENC and other seven methods in MIPS and DIP networks

Methods DC BC EC SC LAC NC UC CENCMIPS 0.289 0.277 0.225 0.277 0.289 0.289 0.289 0.300DIP 0.327 0.307 0.312 0.340 0.340 0.339 0.331 0.340

The jackknife methodology was developed by Holman et al., being an eﬀective universal pre-diction method [30]. The X-axis represents the quantity of selected predictive essential proteinsafter sequencing, and the Y-axis represents the quantity of true essential proteins in the selectedproteins. First, according to the predicted value, proteins are sorted in descending order. Thenwe choose predictive essential proteins from top 0 to top 800 in each dataset. Last, the jackknifecurve is drawn based on the accumulated quantity of real essential proteins. From Fig. 7 andFig. 8, we can see that the prediction eﬃciency for

CEN C method is higher than that of otherseven centrality measures on the MIPS and DIP networks. Consequently, the jackknife curvesreveal that our method

CEN C is an eﬀective approach for predicting essential proteins.Fig. 7: The performances of

CEN C and other seven centrality measures on the DIP networkare evaluated by a jackknife methodology.

Essential proteins are crucial for the survival and normal functioning of all organisms. Im-proving the recognition accuracy of essential proteins is a challenging task. Plenty of scholarsdevoted themselves to identify essential proteins in terms of the topological features for thewhole network, ignoring the importance of complex and biological information. In this paper,on the basis of the mixed clustering coeﬃcient for complexes and edge topology, a new method

CEN C is proposed. Then two diﬀerent datasets of MIPS and DIP are applied. The evaluationmethods include “sorting-screening” method, six statistical method, the precision-recall curves,

ROC curve,

AU C and jackknife method. Then we compare

CEN C and other seven proposedmethods, containing DC , BC , EC , SC , LAC , N C and

U C by using these evaluation methods.10ig. 8: The performances of

CEN C and other seven centrality measures on the MIPS networkare evaluated by a jackknife methodology.It is found that our proposed method of

CEN C has the ability to improve the accuracy inpredicting essential proteins.

References [1] Fraser H B, Hirsh A E, et al., Evolutionary Rate in the Protein Interaction Network, Science, 296(5568):750-752, 2002.[2] Xu B, Guan J, Wang Y, et al., Essential protein detection by random walk on weighted protein-proteininteraction networks, IEEE/ACM Trans Comput Biol Bioinform, PP(99):1-1, 2017.[3] Winzeler E A, Shoemaker D D, Astromoﬀ A, Liang H, Anderson K, Andre B, et al., Functional charac-terization of the s. cerevisiae genome by gene deletion and parallel analysis, Science, 285 (5429):901-906,1999.[4] Wang Y, Sun H, Du W, Blanzieri E, Viero G, Xu Y, et al., Identiﬁcation of essential proteins based onranking edge-weights in protein-protein interaction networks, PloS One, 9(9):e108716, 2014.[5] Roemer T, Jiang B, Davison J, Ketela T, Veillette K, et al., Large-scale essential gene identiﬁcation inCandida albicans and applications to antifungal drug discovery, Mol Microbiol, 50:167-181, 2003.[6] Cullen L M, Arndt G M, Genome-wide screening for gene function using RNAi in mammalian cells, ImmunolCell Biol, 83:217-223, 2005.[7] Estrada E, Virtual identiﬁcation of essential proteins within the protein interaction network of yeast,Proteomics, 6(1):35-40, 2006.[8] Peng W, Wang J, Wang W, et al., Iteration method for predicting essential proteins based on orthology andprotein-protein interaction networks, BMC Systems Biology, 6(1),2012.[9] Giaever G, Chu A M, Ni L, et al., SGD: Functional proﬁling of the saccharomyces cerevisiae genome, Nature,418(6896):387-391, 2002.[10] Jeong H M, Mason S P, Albert B, et al., Lethality and centrality in protein networks, Nature, 411:41-42,2001.[11] Zhao B H, Wang J X, Li M, et al., Prediction of Essential Proteins Based on Overlapping Essential Modules,IEEE Transactions on Nanobioscience, 13(4):415-424, 2014.

12] Hahn M W, Kern A D, Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks, Molecular Biology and Evolution, 22(4):803-806, 2005.[13] Freeman L C, A set of measures of centrality based on betweenness, Sociometry, 40(1):35-41, 1977.[14] Li M , Li W , Wu F X , et al., Identifying essential proteins based on sub-network partition and prioritizationby integrating subcellular localization information, Journal of Theoretical Biology, 2018.[15] Wuchty S, Stadler P F, Centers of complex networks, Journal of Theoretical Biology, 223(1):45-53, 2003.[16] Batada N N, Hurst L D, Tyers M, Evolutionary and Physiological Importance of Hub Proteins, PLoSComputational Biology, 2(7):e88, 2006.[17] Lin C C, Juan H F, Hsiang J T, Hwang Y C, Mori H, Huang H C, Essential core of protein-protein interactionnetwork in escherichia coli, Journal of Proteome Research, 8(4):1925-1931, 2009.[18] Liang H, Li W H, Gene essentiality, gene duplicability and protein connectivity in human and mouse, Trendsin Genetics, 23(8):375-378, 2007.[19] Estrada E, Juan A, Subgraph centrality in complex networks, Physical Review E, 71(5):1-9, 2005.[20] Li M, Wang J, Chen X, et al., A local average connectivity-based method for identifying essential proteinsfrom the network level, Computational Biology and Chemistry, 35(3):143-150, 2011.[21] Bonacich P, Power and centrality: a family of measures, American Journal of Sociology, 92(5):1170-1182,1987.[22] Nie T , Guo Z , Zhao K , et al., Using mapping entropy to identify node centrality in complex networksPhysica A, Statistical Mechanics and its Applications, 453:290-297, 2016.[23] Radicchi F, Castellano C, Cecconi F, et al., Deﬁning and identifying communities in networks, Proceedingsof the National Academy of Sciences of the United States of America, 101(9):2658-2663, 2003.[24] Jiang Y , Wang Y , Pang W , et al., CytoNCA: Essential Protein Identiﬁcation Based on Essential Protein-Protein Interaction Prediction by Integrated Edge Weights, The IEEE International Conference on Bioin-formatics and Biomedicine. IEEE, 2014.[25] Wang J , Li M , Wang H , et al., Identiﬁcation of Essential Proteins Based on Edge Clustering Coeﬃcient,IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4):1070—1080, 2012.[26] Tang Y , Li M , Wang J , et al., CytoNCA: A cytoscape plugin for centrality analysis and evaluation ofprotein interaction networks, Biosystems, 127:67-72, 2015.[27] Stephenson K, Zelen M, Rethinking centrality: methods and examples, Soc Networks, 11:1-37, 1989.[28] Li G , Li M , Wang J , et al., United neighborhood closeness centrality and orthology for predicting essentialproteins, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2018.[29] Zhang Z P, Ruan J S, Gao J Z, et al., Predicting essential proteins from protein-protein interactions usingorder statistics, Journal of Theoretical Bioligy, 480:274-283, 2019.[30] Hart G T, Lee I, Marcotte E M. A high-accuracy consensus map of yeast protein complexes reveals modularnature of gene essentiality. Bmc Bioinformatics, 8(1):236-0, 2007.[31] Li M, Lu Y, Niu Z, et al., United complex centrality for identiﬁcation of essential proteins from PPI networks,IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14(2):370-380, 2017.[32] Li M, Zhang H H, Fei Y P, Essential protein discovery method based on integration of PPI and geneexpression data, Journal of Central South University, 44(3):1024-1029, 2013.[33] Lei X , Zhao J , et al., Predicting essential proteins based on RNA-Seq, subcellular localization and GOannotation datasets, Knowledge-Based Systems, 2018.[34] Xenarios I, Lukasz S, et al., DIP, the database of interacting proteins: a research tool for studying cellularnetworks of protein interactions, Nucleic Acids Research, 30(1):303-305, 2002.

35] Zhang R, Lin Y, DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes, Nucleic AcidsRes, 37(suppl 1):D455-D458, 2009.[36] Wang J, Li M, Wang H, Pan Y, Identiﬁcation of essential proteins based on edge clustering coeﬃcient,Transactions on Computational Biology and Bioinformatics, 9(4):1070-1080, 2012.[37] Friedel C C, Krumsiek J, Zimmer R, International Conference on Research in Computational MolecularBiology, Springer-Verlag, 2008.[38] Pu S, Wong J, Turner B, Cho E, Wodak S J, Up-to-date catalogues of yeast protein complexes, NucleicAcids Research, 37(3):825-831, 2009.[39] Pu S, Vlasblom J, Emili A, et al., Identifying functional modules in the physical interactome of saccharomycescerevisiae, Proteomics, 7(6):944-960, 2010.[40] Holman A G, Davis P J, Foster J M, et al., Computational prediction of essential genes in an unculturableendosymbiotic bacterium, wolbachia of brugia malayi, Bmc Microbiology, 9(1):1-14, 2009.[41] Cherry J M, Adler C, Ball C A, et al., SGD: saccharomyces genome database, Nucleic Acids Research,26(1):73-79, 1998.[42] Li M , Lu Y , Wang J , et al., A Topology Potential-Based Method for Identifying Essential Proteins fromPPI Networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(2):372-383,2015.[43] Radicchi F , Castellano C , Cecconi F, et al., Deﬁning and identifying communities in networks, Proceedingsof the National Academy of Sciences of the United States of America, 101(9):2658-2663, 2003.[44] Zhu Y, Wu C, Identiﬁcation of essential proteins using improved node and edge clustering coeﬃcient,Proceedings of the 37th Chinese Control Conference, 2018.[45] Luo J W, Qi Y, Identiﬁcation of essential proteins based on a new combination of local interaction densityand protein complexes, PLOS ONE, 10(6):e0131418, 2015.[46] Joy M P, Brock A, Ingber D E, et al., High-betweenness proteins in the yeast protein interaction network,Journal of Biomedicine and Biotechnology, 2005(2):96, 2014.[47] Mewes H W, Amid C, Arnold R, et al., MIPS: analysis and annotation of proteins from whole genomes,Nucleic Acids Research, 34(Database issue):169-72, 2004.[48] Pereira-Leal J B, Benjamin A , Peregrin-Alvarez J M, et al., An Exponential Core in the Heart of the YeastProtein Interaction Network, Molecular Biology and Evolution, 2015.[49] P Bradley, The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms,Pattern Recognition, 30:1145-1159, 1996.35] Zhang R, Lin Y, DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes, Nucleic AcidsRes, 37(suppl 1):D455-D458, 2009.[36] Wang J, Li M, Wang H, Pan Y, Identiﬁcation of essential proteins based on edge clustering coeﬃcient,Transactions on Computational Biology and Bioinformatics, 9(4):1070-1080, 2012.[37] Friedel C C, Krumsiek J, Zimmer R, International Conference on Research in Computational MolecularBiology, Springer-Verlag, 2008.[38] Pu S, Wong J, Turner B, Cho E, Wodak S J, Up-to-date catalogues of yeast protein complexes, NucleicAcids Research, 37(3):825-831, 2009.[39] Pu S, Vlasblom J, Emili A, et al., Identifying functional modules in the physical interactome of saccharomycescerevisiae, Proteomics, 7(6):944-960, 2010.[40] Holman A G, Davis P J, Foster J M, et al., Computational prediction of essential genes in an unculturableendosymbiotic bacterium, wolbachia of brugia malayi, Bmc Microbiology, 9(1):1-14, 2009.[41] Cherry J M, Adler C, Ball C A, et al., SGD: saccharomyces genome database, Nucleic Acids Research,26(1):73-79, 1998.[42] Li M , Lu Y , Wang J , et al., A Topology Potential-Based Method for Identifying Essential Proteins fromPPI Networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(2):372-383,2015.[43] Radicchi F , Castellano C , Cecconi F, et al., Deﬁning and identifying communities in networks, Proceedingsof the National Academy of Sciences of the United States of America, 101(9):2658-2663, 2003.[44] Zhu Y, Wu C, Identiﬁcation of essential proteins using improved node and edge clustering coeﬃcient,Proceedings of the 37th Chinese Control Conference, 2018.[45] Luo J W, Qi Y, Identiﬁcation of essential proteins based on a new combination of local interaction densityand protein complexes, PLOS ONE, 10(6):e0131418, 2015.[46] Joy M P, Brock A, Ingber D E, et al., High-betweenness proteins in the yeast protein interaction network,Journal of Biomedicine and Biotechnology, 2005(2):96, 2014.[47] Mewes H W, Amid C, Arnold R, et al., MIPS: analysis and annotation of proteins from whole genomes,Nucleic Acids Research, 34(Database issue):169-72, 2004.[48] Pereira-Leal J B, Benjamin A , Peregrin-Alvarez J M, et al., An Exponential Core in the Heart of the YeastProtein Interaction Network, Molecular Biology and Evolution, 2015.[49] P Bradley, The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms,Pattern Recognition, 30:1145-1159, 1996.