Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Leyi Wei is active.

Publication


Featured researches published by Leyi Wei.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2014

Improved and promising identification of human MicroRNAs by incorporating a high-quality negative set

Leyi Wei; Minghong Liao; Yue Gao; Rongrong Ji; Zengyou He; Quan Zou

MicroRNA (miRNA) plays an important role as a regulator in biological processes. Identification of (pre-) miRNAs helps in understanding regulatory processes. Machine learning methods have been designed for pre-miRNA identification. However, most of them cannot provide reliable predictive performances on independent testing data sets. We assumed this is because the training sets, especially the negative training sets, are not sufficiently representative. To generate a representative negative set, we proposed a novel negative sample selection technique, and successfully collected negative samples with improved quality. Two recent classifiers rebuilt with the proposed negative set achieved an improvement of ∼6 percent in their predictive performance, which confirmed this assumption. Based on the proposed negative set, we constructed a training set, and developed an online system called miRNApre specifically for human pre-miRNA identification. We showed that miRNApre achieved accuracies on updated human and nonhuman data sets that were 34.3 and 7.6 percent higher than those achieved by current methods. The results suggest that miRNApre is an effective tool for pre-miRNA identification. Additionally, by integrating miRNApre, we developed a miRNA mining tool, mirnaDetect, which can be applied to find potential miRNAs in genome-scale data. MirnaDetect achieved a comparable mining performance on human chromosome 19 data as other existing methods.MicroRNA (miRNA) plays an important role as a regulator in biological processes. Identification of (pre-) miRNAs helps in understanding regulatory processes. Machine learning methods have been designed for pre-miRNA identification. However, most of them cannot provide reliable predictive performances on independent testing data sets. We assumed this is because the training sets, especially the negative training sets, are not sufficiently representative. To generate a representative negative set, we proposed a novel negative sample selection technique, and successfully collected negative samples with improved quality. Two recent classifiers rebuilt with the proposed negative set achieved an improvement of ~6 percent in their predictive performance, which confirmed this assumption. Based on the proposed negative set, we constructed a training set, and developed an online system called miRNApre specifically for human pre-miRNA identification. We showed that miRNApre achieved accuracies on updated human and non-human data sets that were 34.3 and 7.6 percent higher than those achieved by current methods. The results suggest that miRNApre is an effective tool for pre-miRNA identification. Additionally, by integrating miRNApre, we developed a miRNA mining tool, mirnaDetect, which can be applied to find potential miRNAs in genome-scale data. MirnaDetect achieved a comparable mining performance on human chromosome 19 data as other existing methods.


Information Sciences | 2017

Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information

Leyi Wei; Jijun Tang; Quan Zou

Abstract Increased knowledge of DNA-binding proteins would enhance our understanding of protein functions in cellular biological processes. To handle the explosive growth of protein sequence data, researchers have developed machine learning-based methods that quickly and accurately predict DNA-binding proteins. In recent years, the predictive accuracy of machine learning-based predictors has significantly advanced, but the predictive performance remains unsatisfactory. In this paper, we establish a novel predictor named Local-DPP, which combines the local Pse-PSSM (Pseudo Position-Specific Scoring Matrix) features with the random forest classifier. The proposed features can efficiently capture the local conservation information, together with the sequence-order information, from the evolutionary profiles (PSSMs). We evaluate and compare the Local-DPP predictor with state-of-the-art predictors on two stringent benchmark datasets (one for the jackknife test, the other for an independent test). The proposed Local-DPP significantly improved the accuracy of the existing predictors, from 77.3% to 79.2% and 76.9% to 79.0% in the jackknife and independent tests, respectively. This demonstrates the efficacy and effectiveness of Local-DPP in predicting DNA-binding proteins. The proposed Local-DPP is now freely accessible to the public through the user-friendly webserver http://server.malab.cn/Local-DPP/Index.html .


IEEE Transactions on Nanobioscience | 2015

Enhanced Protein Fold Prediction Method Through a Novel Feature Extraction Technique

Leyi Wei; Minghong Liao; Xing Gao; Quan Zou

Information of protein 3-dimensional (3D) structures plays an essential role in molecular biology, cell biology, biomedicine, and drug design. Protein fold prediction is considered as an immediate step for deciphering the protein 3D structures. Therefore, protein fold prediction is one of fundamental problems in structural bioinformatics. Recently, numerous taxonomic methods have been developed for protein fold prediction. Unfortunately, the overall prediction accuracies achieved by existing taxonomic methods are not satisfactory although much progress has been made. To address this problem, we propose a novel taxonomic method, called PFPA, which is featured by combining a novel feature set through an ensemble classifier. Particularly, the sequential evolution information from the profiles of PSI-BLAST and the local and global secondary structure information from the profiles of PSI-PRED are combined to construct a comprehensive feature set. Experimental results demonstrate that PFPA outperforms the state-of-the-art predictors. To be specific, when tested on the independent testing set of a benchmark dataset, PFPA achieves an overall accuracy of 73.6%, which is the leading accuracy ever reported. Moreover, PFPA performs well without significant performance degradation on three updated large-scale datasets, indicating the robustness and generalization of PFPA. Currently, a webserver that implements PFPA is freely available on http://121.192.180.204:8080/PFPA/Index.html.


IEEE Transactions on Nanobioscience | 2015

An Improved Protein Structural Classes Prediction Method by Incorporating Both Sequence and Structure Information

Leyi Wei; Minghong Liao; Xing Gao; Quan Zou

Protein structural classes information is beneficial for secondary and tertiary structure prediction, protein folds prediction, and protein function analysis. Thus, predicting protein structural classes is of vital importance. In recent years, several computational methods have been developed for low-sequence-similarity (25%-40%) protein structural classes prediction. However, the reported prediction accuracies are actually not satisfactory. Aiming to further improve the prediction accuracies, we propose three different feature extraction methods and construct a comprehensive feature set that captures both sequence and structure information. By applying a random forest (RF) classifier to the feature set, we further develop a novel method for structural classes prediction. We test the proposed method on three benchmark datasets (25PDB, 640, and 1189) with low sequence similarity, and obtain the overall prediction accuracies of 93.5%, 92.6%, and 93.4%, respectively. Compared with six competing methods, the accuracies we achieved are 3.4%, 6.2%, and 8.7% higher than those achieved by the best-performing methods, showing the superiority of our method. Moreover, due to the limitation of the size of the three benchmark datasets, we further test the proposed method on three updated large-scale datasets with different sequence similarities (40%, 30%, and 25%). The proposed method achieves above 90% accuracies for all the three datasets, consistent with the accuracies on the above three benchmark datasets. Experimental results suggest our method as an effective and promising tool for structural classes prediction. Currently, a webserver that implements the proposed method is available on http://121.192.180.204:8080/RF_PSCP/Index.html.Protein structural classes information is beneficial for secondary and tertiary structure prediction, protein folds prediction, and protein function analysis. Thus, predicting protein structural classes is of vital importance. In recent years, several computational methods have been developed for low-sequence-similarity (25%-40%) protein structural classes prediction. However, the reported prediction accuracies are actually not satisfactory. Aiming to further improve the prediction accuracies, we propose three different feature extraction methods and construct a comprehensive feature set that captures both sequence and structure information. By applying a random forest (RF) classifier to the feature set, we further develop a novel method for structural classes prediction. We test the proposed method on three benchmark datasets (25PDB, 640, and 1189) with low sequence similarity, and obtain the overall prediction accuracies of 93.5%, 92.6%, and 93.4%, respectively. Compared with six competing methods, the accuracies we achieved are 3.4%, 6.2%, and 8.7% higher than those achieved by the best-performing methods, showing the superiority of our method. Moreover, due to the limitation of the size of the three benchmark datasets, we further test the proposed method on three updated large-scale datasets with different sequence similarities (40%, 30%, and 25%). The proposed method achieves above 90% accuracies for all the three datasets, consistent with the accuracies on the above three benchmark datasets. Experimental results suggest our method as an effective and promising tool for structural classes prediction. Currently, a webserver that implements the proposed method is available on http://121.192.180.204:8080/RF_PSCP/Index.html.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2017

Fast prediction of protein methylation sites using a sequence-based feature selection technique

Leyi Wei; Pengwei Xing; Gaotao Shi; Zhi Liang Ji; Quan Zou

Protein methylation, an important post-translational modification, plays crucial roles in many cellular processes. The accurate prediction of protein methylation sites is fundamentally important for revealing the molecular mechanisms undergoing methylation. In recent years, computational prediction based on machine learning algorithms has emerged as a powerful and robust approach for identifying methylation sites, and much progress has been made in predictive performance improvement. However, the predictive performance of existing methods is not satisfactory in terms of overall accuracy. Motivated by this, we propose a novel random-forest-based predictor called MePred-RF, integrating several discriminative sequence-based feature descriptors and improving feature representation capability using a powerful feature selection technique. Importantly, unlike other methods based on multiple, complex information inputs, our proposed MePred-RF is based on sequence information alone. Comparative studies on benchmark datasets via vigorous jackknife tests indicate that our proposed MePred-RF method remarkably outperforms other state-of-the-art predictors, leading by a 4.5 percent average in terms of overall accuracy. A user-friendly webserver that implements the proposed method has been established for researchers’ convenience, and is now freely available for public use through http://server.malab.cn/MePred-RF. We anticipate our research tool to be useful for the large-scale prediction and analysis of protein methylation sites.


Biochimica et Biophysica Acta | 2014

Briefing in family characteristics of microRNAs and their applications in cancer research

Qicong Wang; Leyi Wei; Xinjun Guan; Yunfeng Wu; Quan Zou; Zhi Liang Ji

MicroRNAs (miRNAs) are endogenous, short, non-coding RNA molecules that are directly involved in the post-transcriptional regulation of gene expression. Dysregulation of miRNAs is usually associated with diseases. Since miRNAs in a family intend to have common functional characteristics, proper assignment of miRNA family becomes heuristic for better understanding of miRNA nature and their potentials in clinic. In this review, we will briefly discuss the recent progress in miRNA research, particularly its impact on protein and its clinical application in cancer research in a view of miRNA family. This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications. Guest Editor: Yudong Cai.


IEEE Transactions on Nanobioscience | 2017

PhosPred-RF: A Novel Sequence-Based Predictor for Phosphorylation Sites Using Sequential Information Only

Leyi Wei; Pengwei Xing; Jijun Tang; Quan Zou

Many recent efforts have been made for the development of machine learning-based methods for fast and accurate phosphorylation site prediction. Currently, a majority of well-performing methods are based on hybrid information to build prediction models, such as evolutionary information, disorder information, and so on. Unfortunately, this type of methods suffers two major limitations: one is that it would not be much of help for protein phosphorylation site prediction in case of no obvious homology detected; the other is that computing such the complicated information is time-consuming, which probably limits the usage of predictors in practical applications. In this paper, we present a simple, fast, and powerful feature representation algorithm, which sufficiently explores the sequential information from multiple perspectives only based on primary sequences, and successfully captures the differences between true phosphorylation sites and hboxnon-phosphorylation sites. Using the proposed features, we propose a random forest-based predictor named PhosPred-RF in the prediction of protein phosphorylation sites from proteins. We evaluate and compare the proposed predictor with the state-of-the-art predictors on some benchmark data sets. The experimental results show that PhosPred-RF outperforms other existing predictors, demonstrating its potential to be a useful tool for protein phosphorylation site prediction. Currently, the proposed PhosPred-RF is freely accessible to the public through the user-friendly webserver http://server.malab.cn/PhosPred-RF.


Journal of Proteome Research | 2017

CPPred-RF: A Sequence-based Predictor for Identifying Cell-Penetrating Peptides and Their Uptake Efficiency

Leyi Wei; Pengwei Xing; Ran Su; Gaotao Shi; Zhanshan Sam Ma; Quan Zou

Cell-penetrating peptides (CPPs), have been proven as important drug-delivery vehicles, demonstrating the potential as therapeutic candidates. The past decade has witnessed a rapid growth in CPP-based research. Recently, many computational efforts have been made to develop machine-learning-based methods for identifying CPPs. Although much progress has been made, existing methods still suffer low feature representation capability that limits further performance improvement. In this study, we propose a novel predictor called CPPred-RF, in which we integrate multiple sequence-based feature descriptors to sufficiently explore distinct information embedded in CPPs, employ a well-established feature selection technique to improve the feature representation, and, for the first time, construct a two-layer prediction framework based on the random forest algorithm. The jackknife results on benchmark data sets show that the proposed CPPred-RF is at least competitive with the state-of-the-art predictors. Moreover, we establish the first online Web server in terms of predicting CPPs and their uptake efficiency simultaneously. It is freely available at http://server.malab.cn/CPPred-RF .


Current Genomics | 2013

Computational Approaches in Detecting Non- Coding RNA

Chunyu Wang; Leyi Wei; Maozu Guo; Quan Zou

The important role of non coding RNAs (ncRNAs) in the cell has made their identification a critical issue in the biological research. However, traditional approaches such as PT-PCR and Northern Blot are costly. With recent progress in bioinformatics and computational prediction technology, the discovery of ncRNAs has become realistically possible. This paper aims to introduce major computational approaches in the identification of ncRNAs, including homologous search, de novo prediction and mining in deep sequencing data. Furthermore, related software tools have been compared and reviewed along with a discussion on future improvements.


International Journal of Molecular Sciences | 2016

Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition

Leyi Wei; Quan Zou

Knowledge on protein folding has a profound impact on understanding the heterogeneity and molecular function of proteins, further facilitating drug design. Predicting the 3D structure (fold) of a protein is a key problem in molecular biology. Determination of the fold of a protein mainly relies on molecular experimental methods. With the development of next-generation sequencing techniques, the discovery of new protein sequences has been rapidly increasing. With such a great number of proteins, the use of experimental techniques to determine protein folding is extremely difficult because these techniques are time consuming and expensive. Thus, developing computational prediction methods that can automatically, rapidly, and accurately classify unknown protein sequences into specific fold categories is urgently needed. Computational recognition of protein folds has been a recent research hotspot in bioinformatics and computational biology. Many computational efforts have been made, generating a variety of computational prediction methods. In this review, we conduct a comprehensive survey of recent computational methods, especially machine learning-based methods, for protein fold recognition. This review is anticipated to assist researchers in their pursuit to systematically understand the computational recognition of protein folds.

Collaboration


Dive into the Leyi Wei's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jijun Tang

University of South Carolina

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bing Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge