Renxiang Yan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Renxiang Yan is active.

Explore More

Publication

Featured researches published by Renxiang Yan.

PLOS ONE | 2011

Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs

Zhen Chen; Yong-Zi Chen; Xiaofeng Wang; Chuan Wang; Renxiang Yan; Ziding Zhang

As one of the most important reversible protein post-translation modifications, ubiquitination has been reported to be involved in lots of biological processes and closely implicated with various diseases. To fully decipher the molecular mechanisms of ubiquitination-related biological processes, an initial but crucial step is the recognition of ubiquitylated substrates and the corresponding ubiquitination sites. Here, a new bioinformatics tool named CKSAAP_UbSite was developed to predict ubiquitination sites from protein sequences. With the assistance of Support Vector Machine (SVM), the highlight of CKSAAP_UbSite is to employ the composition of k-spaced amino acid pairs surrounding a query site (i.e. any lysine in a query sequence) as input. When trained and tested in the dataset of yeast ubiquitination sites (Radivojac et al, Proteins, 2010, 78: 365–380), a 100-fold cross-validation on a 1∶1 ratio of positive and negative samples revealed that the accuracy and MCC of CKSAAP_UbSite reached 73.40% and 0.4694, respectively. The proposed CKSAAP_UbSite has also been intensively benchmarked to exhibit better performance than some existing predictors, suggesting that it can be served as a useful tool to the community. Currently, CKSAAP_UbSite is freely accessible at http://protein.cau.edu.cn/cksaap_ubsite/. Moreover, we also found that the sequence patterns around ubiquitination sites are not conserved across different species. To ensure a reasonable prediction performance, the application of the current CKSAAP_UbSite should be limited to the proteome of yeast.

PLOS ONE | 2011

Predicting Residue-Residue Contacts and Helix-Helix Interactions in Transmembrane Proteins Using an Integrative Feature-Based Random Forest Approach

Xiaofeng Wang; Zhen Chen; Chuan Wang; Renxiang Yan; Ziding Zhang

Integral membrane proteins constitute 25–30% of genomes and play crucial roles in many biological processes. However, less than 1% of membrane protein structures are in the Protein Data Bank. In this context, it is important to develop reliable computational methods for predicting the structures of membrane proteins. Here, we present the first application of random forest (RF) for residue-residue contact prediction in transmembrane proteins, which we term as TMhhcp. Rigorous cross-validation tests indicate that the built RF models provide a more favorable prediction performance compared with two state-of-the-art methods, i.e., TMHcon and MEMPACK. Using a strict leave-one-protein-out jackknifing procedure, they were capable of reaching the top L/5 prediction accuracies of 49.5% and 48.8% for two different residue contact definitions, respectively. The predicted residue contacts were further employed to predict interacting helical pairs and achieved the Matthews correlation coefficients of 0.430 and 0.424, according to two different residue contact definitions, respectively. To facilitate the academic community, the TMhhcp server has been made freely accessible at http://protein.cau.edu.cn/tmhhcp.

BMC Bioinformatics | 2009

DescFold: A web server for protein fold recognition

Renxiang Yan; Jing-Na Si; Chuan Wang; Ziding Zhang

BackgroundMachine learning-based methods have been proven to be powerful in developing new fold recognition tools. In our previous work [Zhang, Kochhar and Grigorov (2005) Protein Science, 14: 431-444], a machine learning-based method called DescFold was established by using Support Vector Machines (SVMs) to combine the following four descriptors: a profile-sequence-alignment-based descriptor using Psi-blast e-values and bit scores, a sequence-profile-alignment-based descriptor using Rps-blast e-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), and a descriptor based on the occurrence of PROSITE functional motifs. In this work, we focus on the improvement of DescFold by incorporating more powerful descriptors and setting up a user-friendly web server.ResultsIn seeking more powerful descriptors, the profile-profile alignment score generated from the COMPASS algorithm was first considered as a new descriptor (i.e., PPA). When considering a profile-profile alignment between two proteins in the context of fold recognition, one protein is regarded as a template (i.e., its 3D structure is known). Instead of a sequence profile derived from a Psi-blast search, a structure-seeded profile for the template protein was generated by searching its structural neighbors with the assistance of the TM-align structural alignment algorithm. Moreover, the COMPASS algorithm was used again to derive a profile-structural-profile-alignment-based descriptor (i.e., PSPA). We trained and tested the new DescFold in a total of 1,835 highly diverse proteins extracted from the SCOP 1.73 version. When the PPA and PSPA descriptors were introduced, the new DescFold boosts the performance of fold recognition substantially. Using the SCOP_1.73_40% dataset as the fold library, the DescFold web server based on the trained SVM models was further constructed. To provide a large-scale test for the new DescFold, a stringent test set of 1,866 proteins were selected from the SCOP 1.75 version. At a less than 5% false positive rate control, the new DescFold is able to correctly recognize structural homologs at the fold level for nearly 46% test proteins. Additionally, we also benchmarked the DescFold method against several well-established fold recognition algorithms through the LiveBench targets and Lindahl dataset.ConclusionsThe new DescFold method was intensively benchmarked to have very competitive performance compared with some well-established fold recognition methods, suggesting that it can serve as a useful tool to assist in template-based protein structure prediction. The DescFold server is freely accessible at http://202.112.170.199/DescFold/index.html.

BMC Bioinformatics | 2011

Outer membrane proteins can be simply identified using secondary structure element alignment

Renxiang Yan; Zhen Chen; Ziding Zhang

BackgroundOuter membrane proteins (OMPs) are frequently found in the outer membranes of gram-negative bacteria, mitochondria and chloroplasts and have been found to play diverse functional roles. Computational discrimination of OMPs from globular proteins and other types of membrane proteins is helpful to accelerate new genome annotation and drug discovery.ResultsBased on the observation that almost all OMPs consist of antiparallel β-strands in a barrel shape and that their secondary structure arrangements differ from those of other types of proteins, we propose a simple method called SSEA-OMP to identify OMPs using secondary structure element alignment. Through intensive benchmark experiments, the proposed SSEA-OMP method is better than some well-established OMP detection methods.ConclusionsThe major advantage of SSEA-OMP is its good prediction performance considering its simplicity. The web server implements the method is freely accessible at http://protein.cau.edu.cn/SSEA-OMP/index.html.

BMC Structural Biology | 2009

TIM-Finder: A new method for identifying TIM-barrel proteins

Jing-Na Si; Renxiang Yan; Chuan Wang; Ziding Zhang; Xiao-Dong Su

BackgroundThe triosephosphate isomerase (TIM)-barrel fold occurs frequently in the proteomes of different organisms, and the known TIM-barrel proteins have been found to play diverse functional roles. To accelerate the exploration of the sequence-structure protein landscape in the TIM-barrel fold, a computational tool that allows sensitive detection of TIM-barrel proteins is required.ResultsTo develop a new TIM-barrel protein identification method in this work, we consider three descriptors: a sequence-alignment-based descriptor using PSI-BLAST e-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), and a descriptor based on the occurrence of PROSITE functional motifs. With the assistance of Support Vector Machine (SVM), the three descriptors were combined to obtain a new method with improved performance, which we call TIM-Finder. When tested on the whole proteome of Bacillus subtilis, TIM-Finder is able to detect 194 TIM-barrel proteins at a 99% confidence level, outperforming the PSI-BLAST search as well as one existing fold recognition method.ConclusionsTIM-Finder can serve as a competitive tool for proteome-wide TIM-barrel protein identification. The TIM-Finder web server is freely accessible at http://202.112.170.199/TIM-Finder/.

Scientific Reports | 2015

Prediction of structural features and application to outer membrane protein identification

Renxiang Yan; Xiaofeng Wang; Lanqing Huang; Feidi Yan; Xiaoyu Xue; Weiwen Cai

Protein three-dimensional (3D) structures provide insightful information in many fields of biology. One-dimensional properties derived from 3D structures such as secondary structure, residue solvent accessibility, residue depth and backbone torsion angles are helpful to protein function prediction, fold recognition and ab initio folding. Here, we predict various structural features with the assistance of neural network learning. Based on an independent test dataset, protein secondary structure prediction generates an overall Q3 accuracy of ~80%. Meanwhile, the prediction of relative solvent accessibility obtains the highest mean absolute error of 0.164, and prediction of residue depth achieves the lowest mean absolute error of 0.062. We further improve the outer membrane protein identification by including the predicted structural features in a scoring function using a simple profile-to-profile alignment. The results demonstrate that the accuracy of outer membrane protein identification can be improved by ~3% at a 1% false positive level when structural features are incorporated. Finally, our methods are available as two convenient and easy-to-use programs. One is PSSM-2-Features for predicting secondary structure, relative solvent accessibility, residue depth and backbone torsion angles, the other is PPA-OMP for identifying outer membrane proteins from proteomes.

Scientific Reports | 2016

DephosSite: a machine learning approach for discovering phosphotase-specific dephosphorylation sites.

Xiaofeng Wang; Renxiang Yan; Jiangning Song

Protein dephosphorylation, which is an inverse process of phosphorylation, plays a crucial role in a myriad of cellular processes, including mitotic cycle, proliferation, differentiation, and cell growth. Compared with tyrosine kinase substrate and phosphorylation site prediction, there is a paucity of studies focusing on computational methods of predicting protein tyrosine phosphatase substrates and dephosphorylation sites. In this work, we developed two elegant models for predicting the substrate dephosphorylation sites of three specific phosphatases, namely, PTP1B, SHP-1, and SHP-2. The first predictor is called MGPS-DEPHOS, which is modified from the GPS (Group-based Prediction System) algorithm with an interpretable capability. The second predictor is called CKSAAP-DEPHOS, which is built through the combination of support vector machine (SVM) and the composition of k-spaced amino acid pairs (CKSAAP) encoding scheme. Benchmarking experiments using jackknife cross validation and 30 repeats of 5-fold cross validation tests show that MGPS-DEPHOS and CKSAAP-DEPHOS achieved AUC values of 0.921, 0.914 and 0.912, for predicting dephosphorylation sites of the three phosphatases PTP1B, SHP-1, and SHP-2, respectively. Both methods outperformed the previously developed kNN-DEPHOS algorithm. In addition, a web server implementing our algorithms is publicly available at http://genomics.fzu.edu.cn/dephossite/ for the research community.

Molecular BioSystems | 2014

Prediction of outer membrane proteins by combining the position- and composition-based features of sequence profiles

Renxiang Yan; Jun Lin; Zhen Chen; Xiaofeng Wang; Lanqing Huang; Weiwen Cai; Ziding Zhang

Locating the transmembrane regions of outer membrane proteins (OMPs) is highly important for deciphering their biological functions at both molecular and cellular levels. Here, we propose a novel method to predict the transmembrane regions of OMPs by employing the position- and composition-based features of sequence profiles. Furthermore, a simple probability-based prediction model, which is estimated by the secondary structures of structurally known OMPs, is also developed. Considering that these two methods are both effective and well complementary, we integrate them into a method called TransOMP, which is also capable of identifying OMPs. Furthermore, we develop an OMP identification measure I_CScore by considering transmembrane regions by TransOMP and secondary structural topology by SSEA-OMP. Our methods were benchmarked against state-of-the-art methods and assessed in the genome of Escherichia coli. Benchmark results confirmed that our methods were reliable and useful. Meanwhile, we constructed an OMP prediction web server, which can be used for OMP identification, transmembrane region location, and 3D model building.

Computational Biology and Chemistry | 2011

Comparison of linear gap penalties and profile-based variable gap penalties in profile-profile alignments

Chuan Wang; Renxiang Yan; Xiaofeng Wang; Jing-Na Si; Ziding Zhang

Profile-profile alignment algorithms have proven powerful for recognizing remote homologs and generating alignments by effectively integrating sequence evolutionary information into scoring functions. In comparison to scoring function, the development of gap penalty functions has rarely been addressed in profile-profile alignment algorithms. Although indel frequency profiles have been used to construct profile-based variable gap penalties in some profile-profile alignment algorithms, there is still no fair comparison between variable gap penalties and traditional linear gap penalties to quantify the improvement of alignment accuracy. We compared two linear gap penalty functions, the traditional affine gap penalty (AGP) and the bilinear gap penalty (BGP), with two profile-based variable gap penalty functions, the Profile-based Gap Penalty used in SP(5) (SPGP) and a new Weighted Profile-based Gap Penalty (WPGP) developed by us, on some well-established benchmark datasets. Our results show that profile-based variable gap penalties get limited improvements than linear gap penalties, whether incorporated with secondary structure information or not. Secondary structure information appears less powerful to be incorporated into gap penalties than into scoring functions. Analysis of gap length distributions indicates that gap penalties could stably maintain corresponding distributions of gap lengths in their alignments, but the distribution difference from reference alignments does not reflect the performance of gap penalties. There is useful information in indel frequency profiles, but it is still not good enough for improving alignment accuracy when used in profile-based variable gap penalties. All of the methods tested in this work are freely accessible at http://protein.cau.edu.cn/gppat/.

RSC Advances | 2017

Transmembrane region prediction by using sequence-derived features and machine learning methods

Renxiang Yan; Xiaofeng Wang; Lanqing Huang; Yarong Tian; Weiwen Cai

Membrane proteins are central to carrying out impressive biological functions. In general, accurate knowledge of transmembrane (TM) regions facilitates ab initio folding and functional annotations of membrane proteins. Therefore, large-scale locating of TM regions in membrane proteins by wet experiments is needed; however, it is hampered by practical difficulties. In this context, in silico methods for TM prediction are highly desired. Here, we present a TM region prediction method using machine learning algorithms and sequence evolutionary profiles. Hydrophobic properties were also assessed. Furthermore, a combined method using sequence evolutionary profiles and hydrophobicity measures was tested. The model was intensively trained on large datasets by means of neural network and random forest learning algorithms for TM region prediction. The proposed method can be directly applied to identify membrane proteins from proteome-wide sequences. Benchmark results suggest that our method is an attractive alternative to membrane protein prediction for real-world applications. The web server and stand-alone program of the proposed method are publicly available at http://genomics.fzu.edu.cn/nnme/index.html.

Explore More