Two new methods for identifying proteins based on the domain protein complexes and topological properties
TTwo new methods for identifying proteins based on the domainprotein complexes and topological properties ∗ Pengli Lu † and JingJuan Yu School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, P.R. China
Abstract
The recognition of essential proteins not only can help to understand the mechanism ofcell operation, but also help to study the mechanism of biological evolution. At present,many scholars have been discovering essential proteins according to the topological structureof protein network and complexes. While some proteins still can not be recognized. Inthis paper, we proposed two new methods complex degree centrality (
CDC ) and complexin-degree and betweenness definition (
CIBD ) which integrate the local character of proteincomplexes and topological properties to determine the essentiality of proteins. First, we givethe definitions of complex average centrality (
CAC ) and complex hybrid centrality (
CHC )which both describe the properties of protein complexes. Then we propose these new methods
CDC and
CIBD based on
CAC and
CHC definitions. In order to access these two methods,different Protein-Protein Interaction (PPI) networks of Saccharomyces cerevisiae, DIP, MIPSand YMBD are used as experimental materials. Experimental results in networks show thatthe methods of
CDC and
CIBD can help to improve the precision of predicting essentialproteins.
Keywords:
Protein interaction network; Essential protein; Topology; Protein complex
Protein is one of the main components of human life. Essential protein is defined as aprotein which would result in the inability of the organism to survive when it is removed by aknockout mutation. Essential proteins are more conserved in biological evolution in comparisionto non-essential proteins [1]. Not only can essential proteins help us understand the growthcontrol system of cells, and then understand the mechanism of life, but also help the study ofbiological evolution mechanism [2]. Removing essential proteins can lead to fatal or infertility[3]. Determining the essentiality of proteins is of great significance to the research of systembiology which provides valuable theories and methods for the diagnosis of diseases, drug design,etc. [4]. Therefore, identifying the essential protein is meaningful in biomedicine.Previous methods for identifying essential proteins mainly used some biological experiments,including conditional knockouts [5], RNA interference [6], and single gene knockouts [7], coupledwith the survival ability of infected organisms being tested. However, these biological experi-mental processes not only consume amounts of time and costs, but also require a lot of biological ∗ Supported by the National Natural Science Foundation of China (No.11361033) and the Natural ScienceFoundation of Gansu Province (No.1212RJZA029). † Corresponding author. E-mail addresses: [email protected] (
P. Lu ), [email protected] (
J. Yu ). a r X i v : . [ q - b i o . M N ] M a r esources. Nowadays, it has been a crucial research direction in the field of bioinformatics forpredicting essential proteins from a large number of biological experiments by using computertechnology theory and research methods.Jeong H M et al. put forward that the essentiality of proteins is associated with the topolog-ical structure in protein interaction networks [8]. There are some species including S.cerevisiae,E.coli, C.elegans and D.melanogaster that have demonstrated the hubs in PPIs have more chanceto be essential proteins [9]. Thus, we are working to investigate the importance of proteins intopologies to essential proteins. On the basis of network topology characteristics of nodes, thereare many centrality measures to discover essential proteins. Some of them are global networkcharacteristics, like betweenness centrality ( BC ) [11,38], eigenvector centrality ( EC ) [19], infor-mation centrality ( IC ) [20] and closeness centrality ( CC ) [13]. Others are local network features,such as degree centrality ( DC ) [10,14,15], subgraph centrality ( SC ) [16], local average centrality( LAC ) [17] and topology potential-based method (
T P ) [34]. On the basis of network topol-ogy characteristics of edges, there are also some measures, including edge clustering coefficient(
ECC ) [35], and improved node and edge clustering coefficient (
IN EC ) [36]. In recent years,many scholars have been working to identify proteins in combination with protein information,such as
P eC which combines edge clustering coefficients with gene expression data correlationcoefficients [24], esP OS which using gene expression information and subcellular localizationinformation [21],
SP P which based on sub-network partition and prioritization by integratingsubcellular localization [12], extended pareto optimality consensus model (
EP OC ) that fusesneighborhood closeness centrality and Orthology information [39]. Go terms information canalso be used to predict essential proteins such as
RSG method in [25].Apart from analyzing the essentiality of proteins from topological point of view and proteininformation, analyzing the characteristics from the perspective of protein complexes has becomeanother direction of our study. Hart G T et al. found that the essential proteins are oftendetermined by the protein complexes in which the protein is involved, rather than by a singleprotein [22]. Li et al. also prove that the frequency of the essential proteins appear in thecomplex would be more than that in the whole network [21,41]. To give examples, Luo J W et al.raised the local interaction density of binding protein complexes (
LIDC ) for predicting essentialproteins [37]. Qin C et al. put forward the
LBCC , a measure on the basis of both networktopology features and protein complexes [18]. Li et al. proposed united complex centrality(
U C ) which combine the edge clustering coefficient and the freqencies of proteins appeared incomplexes [23]. From the results of their experiences, we can see that the performances of thesemethods are better than using the pure topological methods.Therefore, on the basis of the association with protein complexes information and topologicalproperties, our two new novel methods complex degree centrality (
CDC ) and complex in-degreeand betweenness definition (
CIBD ) are proposed. In order to describe the structural propertiesof protein complexes, we define
CAC and
CHC of a node v . Between the two indicators we putforward, one is called CDC which combine the node and its neighbors properties to describethe features for protein complexes, the other is called
CIBD based on the features of proteincomplexes, local features and global properties in the network.To assess the quality of
CDC and
CIBD methods, we apply them to different datasets ofSaccharomyes cerevisiae, DIP, MIPS and YMBD. In order to obtain the performance of ourproposed methods, we make comparisions by using some existing measures, including DC , BC ,2 AC , SC , LBCC , EC , SoECC and
U C which can gain the original paper from [10], [11],[17], [16], [18] ,[19], [28] and [23] respectively. In terms of the sensitivity, specificity, positivepredictive value, negative predict value, F-measure, accuracy rate and the evaluation methodsof “sorting-screening”, the precision-recall curves and jackknife, the results show that our twomethods are more effective in determining the essentiality of proteins than existing measures.
An undirected simple graph G ( V, E ) can be used to express a network of protein interaction.Proteins can be regarded as nodes set V of a network and the connections between two proteinscan be regarded as edges set E . The number of nodes and edges in a graph G can be definedas | V ( G ) | and | E ( G ) | separately. The neighbor set of node v is denoted by N v , and its numbercan be represented as | N v | . The induced subgraph of G [ S ] is a subgraph of G induced by thenodes set S . There are some centralities we need to understand. • Betweenness centrality ( BC ) [11] BC ( v ) = (cid:88) s (cid:54) = v (cid:54) = t ∈ V σ st ( v ) σ st (2.1)where σ st denotes the number of shortest paths between s and t . σ st ( v ) denotes the numberof shortest paths from s to t that pass through the node v . • In-degree centrality of complex (
IDC ) [21]
IDC ( v ) = (cid:88) i ∈ ComplexSet ( v ) IN − Degree ( v ) i (2.2)A subset of protein complexes that containing protein v can be represented as ComplexSet ( v ),the degree of node v for the i th protein complex which belongs to ComplexSet ( v ) can berepresented as IN − Degree ( v ) i . • LBCC method [18]
LBCC ( v ) = a ∗ log Den ( v ) + b ∗ log Den ( v )+ c ∗ log IDC ( v ) + d ∗ log BC ( v ) (2.3)Specifically, Den ( v ) = 2 | E ( H ) || V ( H ) | ( | V ( H ) | −
1) (2.4)where the induced subgraph G [ N v (cid:83) { v } ] can be represented as H . Den ( v ) = 2 | E ( H ) || V ( H ) | ( | V ( H ) −
1) (2.5)where M u = (cid:83) u ∈ N v N u , H represents the induced subgraph G [ M u (cid:83) N v (cid:83) { v } ].3 .3 New Centrality: CDC and
CIBD
The basic considerations of
CDC and
CIBD are as follows: (1)The essential proteins appearin complexes can be more frequency. (2)Both the node itself and its neighbors are critical toaffect the essentiality. (3)The global topological is considered to be a factor in locating essentialproteins. Consequently, we present two new definitions to judge the essentiality of proteins bycombining the domain features of protein complex and the topological properties.First, we present a new complex average central definition (
CAC ) for the neighbors of anode v , CAC ( v ) = (cid:80) u ∈ N v IDC ( u ) | N v | (2.6)where (cid:80) u ∈ N v IDC ( u ) represents the total values of IDC for all the neighbors of a node v . IDC centrality has been mentioned in Eq. (2)Then, we propose complex hybrid central definition (
CHC ) by combining the number ofcomplexes for a node v with complex average central definition CAC , CHC ( v ) = N complex ( v ) · CAC ( v ) · IDC ( v ) (2.7)where N complex ( v ) denotes the total number of complexes for a node v .Now, based on the two definitions that we described above, we propose these two newmethods for estimating the essentiality of a node v . One is complex degree centrality ( CDC )which combine the node with its neighbors to describe the properties for protein complexes,
CDC ( v ) = a ∗ CAC ( v ) + b ∗ IDC ( v ) (2.8)where a , b are random parameters ranging from 1 to 10. After conducting plenty of experiments,we can get the best results of the method CDC when a and b are 1 and 4, respectively.The other is complex in-degree and betweenness definition ( CIBD ) which combining
CHC , Den and BC , where the structural property of the protein complexes is described by CHC , thelocal feature is described by
Den and the global property is described by BC . Since the valuesof these measures are quite different, the data is normalized by logarithmic transformation, CIBD ( v ) = a ∗ log( CHC ( v )) + b ∗ log( Den )+ c ∗ log( BC ( v )) (2.9)where a , b and c are random parameters ranging from 1 to 10. Under the amounts of experiments,we can get the best results of the method CIBD when a , b and c are 1, 3 and 1, respectively.The descirption of CDC and
CIBD algorithms are in Table 1.
In order to analyze the performance of these two algorithms of
CDC and
CIBD , experimentsare conducted by using the protein interaction data of Saccharomyes cerevisiae because itsproteins are more complete.Three sets of PPI network data YDIP, YMIPS and YMBD are used. The DIP dataset ismarked as YDIP network [26]; The MIPS dataset is marked as YMIPS network [25]; The YMBD4able 1: Description of CDC and CIBD algorithms
CDC and
CIBD algorithms
Input : Undirected graph G = ( V ( G ) , E ( G )) stands fora PPI network, C = { C i = ( V ( C i ) , E ( C i )) | C i ⊂ G } represents complexes Output : The proteins list sorted by
CDC , CIBD in adescending order : For each vertex v ∈ V ( G ) do IDC ( v ) = 0 : For each ∀ C i ∈ C do03 : calculate IDC ( v ) = IDC ( v ) + IN − Degree ( v ) i //where IN − Degree ( v ) i is the value of DC ( v ) in i th complex : For each vertex v ∈ V ( G ) do05 : Find the neighbor nodes N v of node v //where N v stands for the neighbor nodes set for node v : calculate CAC ( v ) by Equation(6) : For each vertex v ∈ N v do08 : Find the neighbor nodes of N v //where N v stands for the neighbor nodes set for node v which v ∈ N v : calculate Den by Equation(5) : For each vertex v ∈ V ( G ) do11 : calculate CHC ( v ) by Equation(7) : calculate and sort CDC ( v ) by Equation(8) : calculate and sort CIBD ( v ) by Equation(9) network comes from the Mark Gerstein Lab website. In the protein network, all self-interactionand repetitive interaction are deleted as a data preprocessing of these PPIs. Specific proper-ties for these three networks are presented in the Table 2. In the YDIP network, there are5093 proteins and 24743 interactions, whose clustering coefficient is about 0.0973. YMIPS net-work includes 4546 proteins and 12319 interactions, whose clustering coefficient is about 0.0879.YMBD network includes 2559 proteins and 11835 interactions, whose clustering coefficient isabout 0.4445.The known essential protein is derived from four databases: MIPS [40], SGD (Saccha-romyces Genome Database) [33], SGDP (Saccharomyces Genome Deletion Project) [4], andDEG (Database of Essential Genes) [27]. The protein complex set is from CM270 [40], CM425[29], CYC408 and CYC428 datasets [30,31] which can gained from [21], containing 745 proteincomplexes (including 2167 proteins).Table 2: Data details of the three protein networks: YDIP, YMIPS, YMBDDataset Proteins Interactions Average degree Essential proteins Clustering coefficientYDIP 5093 24743 9.72 1167 0.0973YMIPS 4546 12319 5.42 1016 0.0879YMBD 2559 11835 9.25 763 0.4445 According to their values of
CDC , CIBD and other eight prediction measures including DC , BC , EC , SC , LAC , LBCC , SoECC and
U C , proteins are sorted from high to low orders.First, we choose some number of top proteins in sequence as predictive essential proteins andthen compare them with the real essential proteins. This allows us to know the quantity of trueessential proteins. Therefore, the sensitivity ( SN ), specificity ( SP ), F-measure ( F ), accuracy5 ACC ), positive predictive value (
P P V ) and negative predictive value (
N P V ) can be calculated[28,29].The following are the formulas for calculating these six statistical indicators.Sensitivity: SN = T PT P + F N
Specificity: SP = T NT N + F P
Positive predictive value:
P P V = T PT P + F P
Negative predictive value:
N P V = T NT N + F N
F-measure: F = 2 ∗ SN ∗ P P VSN + P P V
Accuracy:
ACC = T P + T NP + N where T P stands for the number of true essential proteins which are correctly selected as essentialproteins.
F P is the number of nonessential proteins which are incorrectly selected as essential.
T N is the number of nonessential proteins which are correctly selected as nonessential.
F N isthe number of essential proteins which are incorrectly selected as nonessential. P and N standfor the sum number of essential and nonessential proteins, respectively. In this paper, to evaluate the efficiency and accuracy of different indicators in identifyingessential proteins, we follow the principle of “sorting-screening” which has described as a flowchart in Fig. 1. Then we compare
CDC and
CIBD methods with other eight previous measuresincluding DC , BC , EC , SC , LAC , LBCC , SoECC and
U C in the three datasets. Thealgorithm for
LBCC was implemented according to [18] which used the same datasets as ours.Other algorithms of DC , BC , EC , SC , LAC , SoECC and
U C were implemented accordingto references [10], [11], [19], [16], [17], [28] and [23] respectively. Besides, we can also getthese algorithms by using CytoNCA [42], which is a Cytoscape app for network centrality. Wehave mentioned the method of BC and LBCC in the Section Previously Proposed CentralityMeasures. Now we give a brief description of other six indicators. • Degree centrality ( DC ) [10] DC ( v ) = deg ( v ) (4.1)where deg ( v ) denotes the degree of a node v .6 ig. 1 “sorting-screening” methodFig. 2 The quantity of true essential proteins determined by CDC and other eight previously methods from the YDIPnetwork. ig. 3 The quantity of true essential proteins determined by CDC , CIBD and other eight previously methods from theYMIPS network.Fig. 4 The quantity of true essential proteins determined by
CIBD and other eight previously methods from the YMBDnetwork. Local average connectivity centrality (
LAC ) [17]
LAC ( v ) = (cid:80) u ∈ N v deg C v ( u ) | N v | (4.2)where C v is the subgraph induced by the node set N v of G and deg C v ( u ) is the number ofits neighbors in C v for a node u ∈ N v . • Subgraph centrality ( SC ) [16] SC ( v ) = ∞ (cid:88) k =0 µ k ( v ) k ! (4.3)where µ k ( v ) denotes the number of closed walks of length k which starts and ends at node v . • Eigenvector centrality ( EC ) [19] EC ( v ) = α max ( v ) (4.4)where α max refers to the main eigenvector corresponding to the largest eigenvalue of thenetwork adjacency matrix A , and α max ( v ) represents the v th component of α max . • The sum of edge clustering coefficients (
SoECC ) [28]
ECC v,u = z v,u min ( k v − , k u −
1) (4.5)where z v,u is the number of triangles that includes the edge e ( v, u ) in network. k v and k u are the degrees of node u and node v , respectively. SoECC ( v ) = (cid:88) u ∈ N v ECC ( v, u ) (4.6)where N v denotes the set of all neighbors of node v . • United complex centrality (
U C ) [23]
U C ( v ) = (cid:88) u ∈ N v ( f u + 1 f M + 1 × ECC v,u )where f u denotes the frequency of protein u appeared in the known protein complexes, f M is the maximum frequency that a protein appeared in the known protein complexes.Specifically, we compare CDC with other eight previous measures in YDIP and YMIPSnetworks, and compare
CIBD with other eight previous measures using YMIPS and YMBDnetworks. Step one, we sort proteins from high to low order on the basis of their values of
CDC , CIBD and other eight previous measures. Step two, we choose the top 100, 200, 300, 400, 500,and 600 proteins as predictive essential proteins, then compare them with the known essentialproteins. Finally, we can get the quantity of true essential proteins among these predictiveessential proteins. The experimental results of these measures are shown in Figs. 2-4.9rom Fig. 2, the quantity of true essential proteins judged by
CDC are 79, 152, 221, 272, 316and 364 from the top 100 to the top 600, respectively, being the best among the seven methodsin YDIP network. Besides
CDC method, the method of
LBCC also has well performancewith 74, 135, 204, 261, 307 and 360 essential proteins correctly identified at the same level. Bycomparison, the true essential proteins determined by
CDC method are increased by 5, 17, 17,11, 9 and 4, respectively. Compared with other recent methods
SoECC and
U C , CDC alsoperforms an excellent improvement. Moreover, the quantity of essential proteins are much morethan previous method including BC , SC and EC . Although LAC has a good performance, ourproposed
CDC also has better results than it.From Fig. 3, we can see that
CIBD and
CDC both perform better than DC , BC , SC , LAC , EC , SoECC and
U C in YMIPS network, except for
LBCC . The method of
LBCC produces the best results at the top of 200, 500 and 600.
CIBD performs the same as
LBCC at the top of 100 and 300. At the top of 400, the performance of
CDC and
CIBD are bothbetter than
LBCC .From Fig. 4,
CIBD performs closely to the
LBCC which gains the best performance attop 100, 200, 400 and 600.
CIBD attains the best performance at the top of 300 and 500. Wecan also see these classical methods ( DC , BC , SC , EC ) perform not well in YMBD network.Hence, our new methods CDC and
CIBD can determine much more true essential proteins inmost cases.
To further judge these two indicators of
CDC , CIBD as well as other eight identificationmeasures, the six statistical methods mentioned in the Section Assessment methods are used.From the formulas, we can obtain some more profound meaning. The sensitivity ( SN ) measuresthe recognition ability of classifiers to identify correct essential proteins, the larger the value is,the better the classifier is. The specificity ( SP ) measures the recognition ability of classifiersto identify correct non-essential proteins. F-measures ( F ) stands for the harmonic mean ofprecision and sensitivity. The higher the accuracy ( ACC ) is, the better the classifier is. Inconclusion, the values for these six statistical method can reflect the quality of indicators.Hence, we sort proteins from high to low order on the basis of their values of these methods;Then we take the top 20 percent proteins into account as predictive essential proteins, theremaining 80 percent can be considered as candidates for nonessential proteins. Comparedwith the known essential protein dataset, we can obtain the values of
T P , T N , F P and
F N .According to the formulas, the values of these six statistical method would be calculated. Onthe three different networks, the comparisons among the values of
CDC , CIBD and other eightmeasures are executed, showing in Table 3.For YDIP network, these six statistic values for
CDC are higher than other previous mea-sures, which show that
CDC has a better prediction accuracy. And the values of BC is thelowest, indicating it has poor performance. For YMIPS and YMBD networks, these six statisticvalues determined by CIBD are similar to
LBCC which also has the ability to predict essentialproteins accurately.In addition, the Precision-Recall curve, a statistical method for evaluating stability, can be10able 3: Comparison the results of sensitivity( SN ), specificity( SP ), positive predictivevalue( P P V ), negative predictive value(
N P V ), F-measure( F ) and accuracy( ACC ) of
CDC , CIBD and other eight previous algorithms.
Dataset Methods SN SP PPV NPV F ACCYDIP DC 0.363 0.825 0.416 0.789 0.388 0.706BC 0.281 0.798 0.354 0.738 0.313 0.652LAC 0.408 0.839 0.467 0.804 0.435 0.729SC 0.335 0.811 0.36 0.794 0.347 0.697LBCC 0.436 0.853 0.512 0.817 0.477 0.749EC 0.344 0.814 0.370 0.796 0.356 0.701SoECC 0.40 0.850 0.463 0.813 0.428 0.739UC 0.391 0.850 0.458 0.811 0.422 0.737CDC .
448 0 .
868 0 .
515 0 .
835 0 .
487 0 . YMIPS DC 0.274 0.821 0.305 0.797 0.289 0.699BC 0.197 0.796 0.278 0.716 0.231 0.629LAC 0.287 0.825 0.321 0.801 0.303 0.705SC 0.139 0.782 0.155 0.759 0.146 0.638LBCC 0.430 0.866 0.480 .
841 0 .
454 0 . EC 0.123 0.774 0.155 0.723 0.137 0.610SoECC 0.281 0.814 0.325 0 .
781 0 .
302 0.686UC 0.271 0.812 0.314 0.778 0.291 0.682CDC 0.376 .
868 0 . . . .
373 0 .
910 0 .
617 0 .
789 0 .
465 0 . EC 0.219 0.851 0.366 0.734 0.274 0.672SoECC 0.266 0.835 0.422 0.715 0.326 0.657UC 0.274 0.838 0.434 0.718 0.336 0.662CIBD .
347 0 .
910 0 .
581 0 .
777 0 .
434 0 . Fig. 5 Precision and recall curves of
CDC and other eight methods for YDIP network. ig. 6 Precision and recall curves of CDC , CIBD and other eight methods for YMIPS network.Fig. 7 Precision and recall curves of
CIBD and other eight methods for YMBD network.
CDC and
CIBD methods and other previous eight measures which defined as follows:
P recision ( n ) = T P ( n ) T P ( n ) + F P ( n ) Recall ( n ) = T P ( n ) T P ( n ) + F N ( n )where the definitions of T P , F P , F N are depicted in the Assessment method Section. Theresults are revealed in Figs. 5-7. In YDIP network, our method of
CDC has better performancethan the other methods. In YMIPS and YDIP networks, the performance of
CDC and
CIBD are similar to the performance of
LBCC . Holman et al. developed the jackknife methodology which is an effective universal predictionmethod [32]. The X-axis represents the quantity of selected predictive essential proteins aftersequencing, and the Y-axis represents the quantity of true essential proteins in the selectedproteins. The area under the curve reflects the performance of each method. The larger thearea under the curve is, the better the centrality is.First, according to the predicted value, proteins are sorted in descending order. And thenwe choose predictive essential proteins of top 600 for each dataset. Last, the jackknife curve isdrawn based on the accumulation quantity of real essential proteins.
Fig. 8 The performances of
CDC and other eight centrality measures on the YDIP network are evaluated by a jackknifemethodology.
From Fig. 8, it can be seen that the prediction efficiency of
CDC is higher than that ofother centrality measures on the YDIP network. From Fig. 9, it is shown that
CDC and
CIBD exhibit performances resemble to that of
LBCC and better than those of all the other methodsincluding DC , BC , LAC , SC and EC , SoECC and
U C on the YMIPS network. From the13 ig. 9 The performances of
CDC , CIBD and other eight centrality measures on the YMIPS network are evaluated by ajackknife methodology.Fig. 10 The performances of
CIBD and other eight centrality measures on the YMBD network are evaluated by ajackknife methodology.
CDC and
CIBD both are effective approaches for predictingessential proteins.
Identifying essential proteins in protein networks is an indispensable point in the post-genomic era. Improving the recognition rate of essential proteins is a challenging task. Atpresent, plenty of centrality algorithms have been proposed to determine the essentiality of pro-teins, most of them focus on the analysis and mining of node topology characteristics. In thispaper, on the basis of the combination of the local features of protein complexes and topolog-ical properties, two new methods are proposed which named as
CDC and
CIBD . We applythem to different datasets YDIP, YMIPS and YMBD. Then we compare the quantity of trueessential proteins predicted by
CDC , CIBD and other eight proposed methods, containing DC , BC , LAC , SC , LBCC , EC , SoECC and
U C . The results show that
CDC and
CIBD per-form well in most cases. By using the methods of the six statistical, the precision-recall curveand jackknife, we can find that our proposed methods of
CDC and
CIBD have the ability toimprove the accuracy in predicting essential proteins. In future work, deepening the miningof protein biological function and biological significance can be another direction to find theessential proteins.
References [1] Fraser H B, Hirsh A E, et al., Evolutionary Rate in the Protein Interaction Network, Science, 296(5568):750-752, 2002.[2] Xu B, Guan J, Wang Y, et al., Essential protein detection by random walk on weighted protein-proteininteraction networks, IEEE/ACM Trans Comput Biol Bioinform, PP(99):1-1, 2017.[3] Winzeler E A, Shoemaker D D, Astromoff A, Liang H, Anderson K, Andre B, et al., Functional charac-terization of the s. cerevisiae genome by gene deletion and parallel analysis, Science, 285 (5429):901-906,1999.[4] Wang Y, Sun H, Du W, Blanzieri E, Viero G, Xu Y, et al., Identification of essential proteins based onranking edge-weights in protein-protein interaction networks, PloS One, 9(9):e108716, 2014.[5] Roemer T, Jiang B, Davison J, Ketela T, Veillette K, et al., Large-scale essential gene identification inCandida albicans and applications to antifungal drug discovery, Mol Microbiol, 50:167-181, 2003.[6] Cullen L M, Arndt G M, Genome-wide screening for gene function using RNAi in mammalian cells, ImmunolCell Biol, 83:217-223, 2005.[7] Giaever G, Chu A M, Ni L, et al., SGD: Functional profiling of the saccharomyces cerevisiae genome, Nature,418(6896):387-391, 2002.[8] Jeong H M, Mason S P, Albert B, et al., Lethality and centrality in protein networks, Nature, 411:41-42,2001.[9] Zhao B H, Wang J X, Li M, et al., Prediction of Essential Proteins Based on Overlapping Essential Modules,IEEE Transactions on Nanobioscience, 13(4):415-424, 2014.[10] Hahn M W, Kern A D, Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks, Molecular Biology and Evolution, 22(4):803-806, 2005.
11] Freeman L C, A set of measures of centrality based on betweenness, Sociometry, 40(1):35-41, 1977.[12] Li M , Li W , Wu F X , et al., Identifying essential proteins based on sub-network partition and prioritizationby integrating subcellular localization information, Journal of Theoretical Biology, 2018.[13] Wuchty S, Stadler P F, Centers of complex networks, Journal of Theoretical Biology, 223(1):45-53, 2003.[14] Lin C C, Juan H F, Hsiang J T, Hwang Y C, Mori H, Huang H C, Essential core of protein-protein interactionnetwork in escherichia coli, Journal of Proteome Research, 8(4):1925-1931, 2009.[15] Liang H, Li W H, Gene essentiality, gene duplicability and protein connectivity in human and mouse, Trendsin Genetics, 23(8):375-378, 2007.[16] Estrada E, Juan A, Subgraph centrality in complex networks, Physical Review E, 71(5):1-9, 2005.[17] Li M, Wang J, Chen X, et al., A local average connectivity-based method for identifying essential proteinsfrom the network level, Computational Biology and Chemistry, 35(3):143-150, 2011.[18] Qin C, Sun Y, Dong Y, A new method for identifying essential proteins based on network topology propertiesand protein complexes, PLOS ONE, 11(8):e0161042, 2016.[19] Bonacich P, Power and centrality: a family of measures, American Journal of Sociology, 92(5):1170-1182,1987.[20] Stephenson K, Zelen M, Rethinking centrality: methods and examples, Soc Networks, 11:1-37, 1989.[21] Zhang Z P, Ruan J S, Gao J Z, et al., Predicting essential proteins from protein-protein interactions usingorder statistics, Journal of Theoretical Bioligy, 480:274-283, 2019.[22] Hart G T, Lee I, Marcotte E M. A high-accuracy consensus map of yeast protein complexes reveals modularnature of gene essentiality. Bmc Bioinformatics, 8(1):236-0, 2007.[23] Li M, Lu Y, Niu Z, et al., United complex centrality for identification of essential proteins from PPI networks,IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14(2):370-380, 2017.[24] Li M, Zhang H H, Fei Y P, Essential protein discovery method based on integration of PPI and geneexpression data, Journal of Central South University, 44(3):1024-1029, 2013.[25] Lei X , Zhao J , et al., Predicting essential proteins based on RNA-Seq, subcellular localization and GOannotation datasets, Knowledge-Based Systems, 2018.[26] Xenarios I, Lukasz S, et al., DIP, the database of interacting proteins: a research tool for studying cellularnetworks of protein interactions, Nucleic Acids Research, 30(1):303-305, 2002.[27] Zhang R, Lin Y, DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes, Nucleic AcidsRes, 37(suppl 1):D455-D458, 2009.[28] Wang J, Li M, Wang H, Pan Y, Identification of essential proteins based on edge clustering coefficient,Transactions on Computational Biology and Bioinformatics, 9(4):1070-1080, 2012.[29] Friedel C C, Krumsiek J, Zimmer R, International Conference on Research in Computational MolecularBiology, Springer-Verlag, 2008.[30] Pu S, Wong J, Turner B, Cho E, Wodak S J, Up-to-date catalogues of yeast protein complexes, NucleicAcids Research, 37(3):825-831, 2009.[31] Pu S, Vlasblom J, Emili A, et al., Identifying functional modules in the physical interactome of saccharomycescerevisiae, Proteomics, 7(6):944-960, 2010.[32] Holman A G, Davis P J, Foster J M, et al., Computational prediction of essential genes in an unculturableendosymbiotic bacterium, wolbachia of brugia malayi, Bmc Microbiology, 9(1):1-14, 2009.[33] Cherry J M, Adler C, Ball C A, et al., SGD: saccharomyces genome database, Nucleic Acids Research,26(1):73-79, 1998.
34] Li M , Lu Y , Wang J , et al., A Topology Potential-Based Method for Identifying Essential Proteins fromPPI Networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(2):372-383,2015.[35] Radicchi F , Castellano C , Cecconi F, et al., Defining and identifying communities in networks, Proceedingsof the National Academy of Sciences of the United States of America, 101(9):2658-2663, 2003.[36] Zhu Y, Wu C, Identification of essential proteins using improved node and edge clustering coefficient,Proceedings of the 37th Chinese Control Conference, 2018.[37] Luo J W, Qi Y, Identification of essential proteins based on a new combination of local interaction densityand protein complexes, PLOS ONE, 10(6):e0131418, 2015.[38] Joy M P, Brock A, Ingber D E, et al., High-betweenness proteins in the yeast protein interaction network,Journal of Biomedicine and Biotechnology, 2005(2):96, 2014.[39] Li G , Li M , Wang J , et al., United neighborhood closeness centrality and orthology for predicting essentialproteins, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2018.[40] Mewes H W, Amid C, Arnold R, et al., MIPS: analysis and annotation of proteins from whole genomes,Nucleic Acids Research, 34(Database issue):169-72, 2004.[41] Pereira-Leal J B, Benjamin A , Peregrin-Alvarez J M, et al., An Exponential Core in the Heart of the YeastProtein Interaction Network, Molecular Biology and Evolution, 2015.[42] Tang Y , Li M , Wang J , et al., CytoNCA: A cytoscape plugin for centrality analysis and evaluation ofprotein interaction networks, Biosystems, 127:67-72, 2015.34] Li M , Lu Y , Wang J , et al., A Topology Potential-Based Method for Identifying Essential Proteins fromPPI Networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(2):372-383,2015.[35] Radicchi F , Castellano C , Cecconi F, et al., Defining and identifying communities in networks, Proceedingsof the National Academy of Sciences of the United States of America, 101(9):2658-2663, 2003.[36] Zhu Y, Wu C, Identification of essential proteins using improved node and edge clustering coefficient,Proceedings of the 37th Chinese Control Conference, 2018.[37] Luo J W, Qi Y, Identification of essential proteins based on a new combination of local interaction densityand protein complexes, PLOS ONE, 10(6):e0131418, 2015.[38] Joy M P, Brock A, Ingber D E, et al., High-betweenness proteins in the yeast protein interaction network,Journal of Biomedicine and Biotechnology, 2005(2):96, 2014.[39] Li G , Li M , Wang J , et al., United neighborhood closeness centrality and orthology for predicting essentialproteins, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2018.[40] Mewes H W, Amid C, Arnold R, et al., MIPS: analysis and annotation of proteins from whole genomes,Nucleic Acids Research, 34(Database issue):169-72, 2004.[41] Pereira-Leal J B, Benjamin A , Peregrin-Alvarez J M, et al., An Exponential Core in the Heart of the YeastProtein Interaction Network, Molecular Biology and Evolution, 2015.[42] Tang Y , Li M , Wang J , et al., CytoNCA: A cytoscape plugin for centrality analysis and evaluation ofprotein interaction networks, Biosystems, 127:67-72, 2015.