Meng-Yun Wu
Sun Yat-sen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Meng-Yun Wu.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2012
Meng-Yun Wu; Dao-Qing Dai; Yu Shi; Hong Yan; Xiao-Fei Zhang
Biomarker identification and cancer classification are two closely related problems. In gene expression data sets, the correlation between genes can be high when they share the same biological pathway. Moreover, the gene expression data sets may contain outliers due to either chemical or electrical reasons. A good gene selection method should take group effects into account and be robust to outliers. In this paper, we propose a Laplace naive Bayes model with mean shrinkage (LNB-MS). The Laplace distribution instead of the normal distribution is used as the conditional distribution of the samples for the reasons that it is less sensitive to outliers and has been applied in many fields. The key technique is the L_1 penalty imposed on the mean of each class to achieve automatic feature selection. The objective function of the proposed model is a piecewise linear function with respect to the mean of each class, of which the optimal value can be evaluated at the breakpoints simply. An efficient algorithm is designed to estimate the parameters in the model. A new strategy that uses the number of selected features to control the regularization parameter is introduced. Experimental results on simulated data sets and 17 publicly available cancer data sets attest to the accuracy, sparsity, efficiency, and robustness of the proposed algorithm. Many biomarkers identified with our method have been verified in biochemical or biomedical research. The analysis of biological and functional correlation of the genes based on Gene Ontology (GO) terms shows that the proposed method guarantees the selection of highly correlated genes simultaneously.
PLOS ONE | 2012
Xiao-Fei Zhang; Dao-Qing Dai; Le Ou-Yang; Meng-Yun Wu
Revealing functional units in protein-protein interaction (PPI) networks are important for understanding cellular functional organization. Current algorithms for identifying functional units mainly focus on cohesive protein complexes which have more internal interactions than external interactions. Most of these approaches do not handle overlaps among complexes since they usually allow a protein to belong to only one complex. Moreover, recent studies have shown that other non-cohesive structural functional units beyond complexes also exist in PPI networks. Thus previous algorithms that just focus on non-overlapping cohesive complexes are not able to present the biological reality fully. Here, we develop a new regularized sparse random graph model (RSRGM) to explore overlapping and various structural functional units in PPI networks. RSRGM is principally dominated by two model parameters. One is used to define the functional units as groups of proteins that have similar patterns of connections to others, which allows RSRGM to detect non-cohesive structural functional units. The other one is used to represent the degree of proteins belonging to the units, which supports a protein belonging to more than one revealed unit. We also propose a regularizer to control the smoothness between the estimators of these two parameters. Experimental results on four S. cerevisiae PPI networks show that the performance of RSRGM on detecting cohesive complexes and overlapping complexes is superior to that of previous competing algorithms. Moreover, RSRGM has the ability to discover biological significant functional units besides complexes.
BMC Bioinformatics | 2015
Xiao-Fei Zhang; Le Ou-Yang; Yuan Zhu; Meng-Yun Wu; Dao-Qing Dai
BackgroundRecently, several studies have drawn attention to the determination of a minimum set of driver proteins that are important for the control of the underlying protein-protein interaction (PPI) networks. In general, the minimum dominating set (MDS) model is widely adopted. However, because the MDS model does not generate a unique MDS configuration, multiple different MDSs would be generated when using different optimization algorithms. Therefore, among these MDSs, it is difficult to find out the one that represents the true driver set of proteins.ResultsTo address this problem, we develop a centrality-corrected minimum dominating set (CC-MDS) model which includes heterogeneity in degree and betweenness centralities of proteins. Both the MDS model and the CC-MDS model are applied on three human PPI networks. Unlike the MDS model, the CC-MDS model generates almost the same sets of driver proteins when we implement it using different optimization algorithms. The CC-MDS model targets more high-degree and high-betweenness proteins than the uncorrected counterpart. The more central position allows CC-MDS proteins to be more important in maintaining the overall network connectivity than MDS proteins. To indicate the functional significance, we find that CC-MDS proteins are involved in, on average, more protein complexes and GO annotations than MDS proteins. We also find that more essential genes, aging genes, disease-associated genes and virus-targeted genes appear in CC-MDS proteins than in MDS proteins. As for the involvement in regulatory functions, the sets of CC-MDS proteins show much stronger enrichment of transcription factors and protein kinases. The results about topological and functional significance demonstrate that the CC-MDS model can capture more driver proteins than the MDS model.ConclusionsBased on the results obtained, the CC-MDS model presents to be a powerful tool for the determination of driver proteins that can control the underlying PPI networks. The software described in this paper and the datasets used are available at https://github.com/Zhangxf-ccnu/CC-MDS.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2013
Yuan Zhu; Xiao-Fei Zhang; Dao-Qing Dai; Meng-Yun Wu
With the rapid development of high-throughput experiment techniques for protein-protein interaction (PPI) detection, a large amount of PPI network data are becoming available. However, the data produced by these techniques have high levels of spurious and missing interactions. This study assigns a new reliably indication for each protein pairs via the new generative network model (RIGNM) where the scale-free property of the PPI network is considered to reliably identify both spurious and missing interactions in the observed high-throughput PPI network. The experimental results show that the RIGNM is more effective and interpretable than the compared methods, which demonstrate that this approach has the potential to better describe the PPI networks and drive new discoveries.
PLOS ONE | 2013
Meng-Yun Wu; Dao-Qing Dai; Xiao-Fei Zhang; Yuan Zhu
In cancer biology, it is very important to understand the phenotypic changes of the patients and discover new cancer subtypes. Recently, microarray-based technologies have shed light on this problem based on gene expression profiles which may contain outliers due to either chemical or electrical reasons. These undiscovered subtypes may be heterogeneous with respect to underlying networks or pathways, and are related with only a few of interdependent biomarkers. This motivates a need for the robust gene expression-based methods capable of discovering such subtypes, elucidating the corresponding network structures and identifying cancer related biomarkers. This study proposes a penalized model-based Student’s t clustering with unconstrained covariance (PMT-UC) to discover cancer subtypes with cluster-specific networks, taking gene dependencies into account and having robustness against outliers. Meanwhile, biomarker identification and network reconstruction are achieved by imposing an adaptive penalty on the means and the inverse scale matrices. The model is fitted via the expectation maximization algorithm utilizing the graphical lasso. Here, a network-based gene selection criterion that identifies biomarkers not as individual genes but as subnetworks is applied. This allows us to implicate low discriminative biomarkers which play a central role in the subnetwork by interconnecting many differentially expressed genes, or have cluster-specific underlying network structures. Experiment results on simulated datasets and one available cancer dataset attest to the effectiveness, robustness of PMT-UC in cancer subtype discovering. Moveover, PMT-UC has the ability to select cancer related biomarkers which have been verified in biochemical or biomedical research and learn the biological significant correlation among genes.
Proteins | 2012
Meng-Yun Wu; Dao-Qing Dai; Hong Yan
Protein‐ligand docking is widely applied to structure‐based virtual screening for drug discovery. This article presents a novel docking technique, PRL‐Dock, based on hydrogen bond matching and probabilistic relaxation labeling. It deals with multiple hydrogen bonds and can match many acceptors and donors simultaneously. In the matching process, the initial probability of matching an acceptor with a donor is estimated by an efficient scoring function and the compatibility coefficients are assigned according to the coexisting condition of two hydrogen bonds. After hydrogen bond matching, the geometric complementarity of the interacting donor and acceptor sites is taken into account for displacement of the ligand. It is reduced to an optimization problem to calculate the optimal translation and rotation matrixes that minimize the root mean square deviation between two sets of points, which can be solved using the Kabsch algorithm. In addition to the van der Waals interaction, the contribution of intermolecular hydrogen bonds in a complex is included in the scoring function to evaluate the docking quality. A modified Lennard‐Jones 12‐6 dispersion‐repulsion term is used to estimate the van der Waals interaction to make the scoring function fairly “soft” so that ligands are not heavily penalized for small errors in the binding geometry. The calculation of this scoring function is very convenient. The evaluation is carried out on 278 rigid complexes and 93 flexible ones where there is at least one intermolecular hydrogen bond. The experiment results of docking accuracy and prediction of binding affinity demonstrate that the proposed method is highly effective. Proteins 2012;
BMC Bioinformatics | 2016
Meng-Yun Wu; Xiao-Fei Zhang; Dao-Qing Dai; Le Ou-Yang; Yuan Zhu; Hong Yan
BackgroudTo facilitate advances in personalized medicine, it is important to detect predictive, stable and interpretable biomarkers related with different clinical characteristics. These clinical characteristics may be heterogeneous with respect to underlying interactions between genes. Usually, traditional methods just focus on detection of differentially expressed genes without taking the interactions between genes into account. Moreover, due to the typical low reproducibility of the selected biomarkers, it is difficult to give a clear biological interpretation for a specific disease. Therefore, it is necessary to design a robust biomarker identification method that can predict disease-associated interactions with high reproducibility.ResultsIn this article, we propose a regularized logistic regression model. Different from previous methods which focus on individual genes or modules, our model takes gene pairs, which are connected in a protein-protein interaction network, into account. A line graph is constructed to represent the adjacencies between pairwise interactions. Based on this line graph, we incorporate the degree information in the model via an adaptive elastic net, which makes our model less dependent on the expression data. Experimental results on six publicly available breast cancer datasets show that our method can not only achieve competitive performance in classification, but also retain great stability in variable selection. Therefore, our model is able to identify the diagnostic and prognostic biomarkers in a more robust way. Moreover, most of the biomarkers discovered by our model have been verified in biochemical or biomedical researches.ConclusionsThe proposed method shows promise in the diagnosis of disease pathogenesis with different clinical characteristics. These advances lead to more accurate and stable biomarker discovery, which can monitor the functional changes that are perturbed by diseases. Based on these predictions, researchers may be able to provide suggestions for new therapeutic approaches.
chinese conference on pattern recognition | 2009
Yu Shi; Dao-Qing Dai; Chuan-Xian Ren; Meng-Yun Wu
Gene selection with interpretation is an important problem in the bioinformatics field. A novel approach called sparse maximal margin features is proposed in this paper for gene subsets selection and visualization. Through transforming an dense eigenvalue decomposition problem into the Elastic-Net regularized sparse regression framework, we introduce sparsity constraint into the coefficients, which is useful to enhance the interpretability of important variables. Moreover, the new method can simultaneously maximize between-class scatter while minimize within-class scatter, and avoid the small sample size problem. The experimental results from gene expression data show that, our method is helpful to select discriminant genes and then provide important foundations for cancer diagnosis.
BMC Bioinformatics | 2016
Xiao-Fei Zhang; Le Ou-Yang; Dao-Qing Dai; Meng-Yun Wu; Yuan Zhu; Hong Yan
BackgroundSeveral recent studies have used the Minimum Dominating Set (MDS) model to identify driver nodes, which provide the control of the underlying networks, in protein interaction networks. There may exist multiple MDS configurations in a given network, thus it is difficult to determine which one represents the real set of driver nodes. Because these previous studies only focus on static networks and ignore the contextual information on particular tissues, their findings could be insufficient or even be misleading.ResultsIn this study, we develop a Collective-Influence-corrected Minimum Dominating Set (CI-MDS) model which takes into account the collective influence of proteins. By integrating molecular expression profiles and static protein interactions, 16 tissue-specific networks are established as well. We then apply the CI-MDS model to each tissue-specific network to detect MDS proteins. It generates almost the same MDSs when it is solved using different optimization algorithms. In addition, we classify MDS proteins into Tissue-Specific MDS (TS-MDS) proteins and HouseKeeping MDS (HK-MDS) proteins based on the number of tissues in which they are expressed and identified as MDS proteins. Notably, we find that TS-MDS proteins and HK-MDS proteins have significantly different topological and functional properties. HK-MDS proteins are more central in protein interaction networks, associated with more functions, evolving more slowly and subjected to a greater number of post-translational modifications than TS-MDS proteins. Unlike TS-MDS proteins, HK-MDS proteins significantly correspond to essential genes, ageing genes, virus-targeted proteins, transcription factors and protein kinases. Moreover, we find that besides HK-MDS proteins, many TS-MDS proteins are also linked to disease related genes, suggesting the tissue specificity of human diseases. Furthermore, functional enrichment analysis reveals that HK-MDS proteins carry out universally necessary biological processes and TS-MDS proteins usually involve in tissue-dependent functions.ConclusionsOur study uncovers key features of TS-MDS proteins and HK-MDS proteins, and is a step forward towards a better understanding of the controllability of human interactomes.
BMC Bioinformatics | 2016
Le Ou-Yang; Xiao-Fei Zhang; Dao-Qing Dai; Meng-Yun Wu; Yuan Zhu; Zhiyong Liu; Hong Yan
BackgroundProtein complexes are the key molecular entities to perform many essential biological functions. In recent years, high-throughput experimental techniques have generated a large amount of protein interaction data. As a consequence, computational analysis of such data for protein complex detection has received increased attention in the literature. However, most existing works focus on predicting protein complexes from a single type of data, either physical interaction data or co-complex interaction data. These two types of data provide compatible and complementary information, so it is necessary to integrate them to discover the underlying structures and obtain better performance in complex detection.ResultsIn this study, we propose a novel multi-view clustering algorithm, called the Partially Shared Multi-View Clustering model (PSMVC), to carry out such an integrated analysis. Unlike traditional multi-view learning algorithms that focus on mining either consistent or complementary information embedded in the multi-view data, PSMVC can jointly explore the shared and specific information inherent in different views. In our experiments, we compare the complexes detected by PSMVC from single data source with those detected from multiple data sources. We observe that jointly analyzing multi-view data benefits the detection of protein complexes. Furthermore, extensive experiment results demonstrate that PSMVC performs much better than 16 state-of-the-art complex detection techniques, including ensemble clustering and data integration techniques.ConclusionsIn this work, we demonstrate that when integrating multiple data sources, using partially shared multi-view clustering model can help to identify protein complexes which are not readily identifiable by conventional single-view-based methods and other integrative analysis methods. All the results and source codes are available on https://github.com/Oyl-CityU/PSMVC.