Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Runtao Yang is active.

Publication


Featured researches published by Runtao Yang.


PLOS ONE | 2015

An ensemble method with hybrid features to identify extracellular matrix proteins.

Runtao Yang; Chengjin Zhang; Rui Gao; Lina Zhang

The extracellular matrix (ECM) is a dynamic composite of secreted proteins that play important roles in numerous biological processes such as tissue morphogenesis, differentiation and homeostasis. Furthermore, various diseases are caused by the dysfunction of ECM proteins. Therefore, identifying these important ECM proteins may assist in understanding related biological processes and drug development. In view of the serious imbalance in the training dataset, a Random Forest-based ensemble method with hybrid features is developed in this paper to identify ECM proteins. Hybrid features are employed by incorporating sequence composition, physicochemical properties, evolutionary and structural information. The Information Gain Ratio and Incremental Feature Selection (IGR-IFS) methods are adopted to select the optimal features. Finally, the resulting predictor termed IECMP (Identify ECM Proteins) achieves an balanced accuracy of 86.4% using the 10-fold cross-validation on the training dataset, which is much higher than results obtained by other methods (ECMPRED: 71.0%, ECMPP: 77.8%). Moreover, when tested on a common independent dataset, our method also achieves significantly improved performance over ECMPP and ECMPRED. These results indicate that IECMP is an effective method for ECM protein prediction, which has a more balanced prediction capability for positive and negative samples. It is anticipated that the proposed method will provide significant information to fully decipher the molecular mechanisms of ECM-related biological processes and discover candidate drug targets. For public access, we develop a user-friendly web server for ECM protein identification that is freely accessible at http://iecmp.weka.cc.


BMC Bioinformatics | 2016

Prediction of aptamer-protein interacting pairs using an ensemble classifier in combination with various protein sequence attributes

Lina Zhang; Chengjin Zhang; Rui Gao; Runtao Yang; Qing Song

BackgroundAptamer-protein interacting pairs play a variety of physiological functions and therapeutic potentials in organisms. Rapidly and effectively predicting aptamer-protein interacting pairs is significant to design aptamers binding to certain interested proteins, which will give insight into understanding mechanisms of aptamer-protein interacting pairs and developing aptamer-based therapies.ResultsIn this study, an ensemble method is presented to predict aptamer-protein interacting pairs with hybrid features. The features for aptamers are extracted from Pseudo K-tuple Nucleotide Composition (PseKNC) while the features for proteins incorporate Discrete Cosine Transformation (DCT), disorder information, and bi-gram Position Specific Scoring Matrix (PSSM). We investigate predictive capabilities of various feature spaces. The proposed ensemble method obtains the best performance with Youden’s Index of 0.380, using the hybrid feature space of PseKNC, DCT, bi-gram PSSM, and disorder information by 10-fold cross validation. The Relief-Incremental Feature Selection (IFS) method is adopted to obtain the optimal feature set. Based on the optimal feature set, the proposed method achieves a balanced performance with a sensitivity of 0.753 and a specificity of 0.725 on the training dataset, which indicates that this method can solve the imbalanced data problem effectively. To evaluate the prediction performance objectively, an independent testing dataset is used to evaluate the proposed method. Encouragingly, our proposed method performs better than previous study with a sensitivity of 0.738 and a Youden’s Index of 0.451.ConclusionsThese results suggest that the proposed method can be a potential candidate for aptamer-protein interacting pair prediction, which may contribute to finding novel aptamer-protein interacting pairs and understanding the relationship between aptamers and proteins.


International Journal of Molecular Sciences | 2015

An Ensemble Method to Distinguish Bacteriophage Virion from Non-Virion Proteins Based on Protein Sequence Characteristics

Lina Zhang; Chengjin Zhang; Rui Gao; Runtao Yang

Bacteriophage virion proteins and non-virion proteins have distinct functions in biological processes, such as specificity determination for host bacteria, bacteriophage replication and transcription. Accurate identification of bacteriophage virion proteins from bacteriophage protein sequences is significant to understand the complex virulence mechanism in host bacteria and the influence of bacteriophages on the development of antibacterial drugs. In this study, an ensemble method for bacteriophage virion protein prediction from bacteriophage protein sequences is put forward with hybrid feature spaces incorporating CTD (composition, transition and distribution), bi-profile Bayes, PseAAC (pseudo-amino acid composition) and PSSM (position-specific scoring matrix). When performing on the training dataset 10-fold cross-validation, the presented method achieves a satisfactory prediction result with a sensitivity of 0.870, a specificity of 0.830, an accuracy of 0.850 and Matthew’s correlation coefficient (MCC) of 0.701, respectively. To evaluate the prediction performance objectively, an independent testing dataset is used to evaluate the proposed method. Encouragingly, our proposed method performs better than previous studies with a sensitivity of 0.853, a specificity of 0.815, an accuracy of 0.831 and MCC of 0.662 on the independent testing dataset. These results suggest that the proposed method can be a potential candidate for bacteriophage virion protein prediction, which may provide a useful tool to find novel antibacterial drugs and to understand the relationship between bacteriophage and host bacteria. For the convenience of the vast majority of experimental scientists, a user-friendly and publicly-accessible web-server for the proposed ensemble method is established.


Journal of Theoretical Biology | 2016

Using the SMOTE technique and hybrid features to predict the types of ion channel-targeted conotoxins.

Lina Zhang; Chengjin Zhang; Rui Gao; Runtao Yang; Qing Song

Conotoxins targeting different ion channels play distinct physiological functions and therapeutic potentials in organisms. Accurate identification of types of ion channel-targeted conotoxins will provide significant clues to reveal the physiological mechanism and pharmacological therapeutic potential of conotoxins. In this study, a random forest based predictor called ICTCPred for the types of ion channel-targeted conotoxin prediction is proposed with hybrid features incorporating CTD (Composition, Transition, and Distribution), g-Gap DC (g-Gap Dipeptide Composition), PP (Physicochemical Properties), and SSI (Secondary Structure Information). To deal with the imbalanced benchmark dataset, the SMOTE Technique (Synthetic Minority Over-sampling Technique) is applied. Based on the above-mentioned individual feature spaces, the average accuracy of ICTCPred lies in the range of 0.729-0.886, indicating the discriminative power of these features. In addition, ICTCPred yields the highest average accuracy of 0.895 using the hybrid feature space of CTD, g-Gap DC, PP and SSI. The Relief-IFS (Incremental Feature Selection) method is adopted to further improve the prediction performance of ICTCPred. Based on the training dataset, ICTCPred achieves satisfactory performance with an average accuracy of 0.910. To evaluate the prediction performance objectively, ICTCPred is compared with previous studies on the same independent testing dataset. Encouragingly, our proposed method performs better than previous studies to identify types of ion channel-targeted conotoxins, with the highest sensitivity of 0.919 for Na(+)-targeted conotoxins, the highest sensitivity of 1 for K(+)-targeted conotoxins, and the highest sensitivity of 1 for Ca(2+)-targeted conotoxins. It is anticipated that ICTCPred can be a potential candidate for the ion channel-targeted conotoxin prediction.


International Journal of Molecular Sciences | 2016

A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data.

Runtao Yang; Chengjin Zhang; Rui Gao; Lina Zhang

The Golgi Apparatus (GA) is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes. The dysfunction of GA proteins can result in neurodegenerative diseases. Therefore, accurate identification of protein subGolgi localizations may assist in drug development and understanding the mechanisms of the GA involved in various cellular processes. In this paper, a new computational method is proposed for identifying cis-Golgi proteins from trans-Golgi proteins. Based on the concept of Common Spatial Patterns (CSP), a novel feature extraction technique is developed to extract evolutionary information from protein sequences. To deal with the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted. A feature selection method called Random Forest-Recursive Feature Elimination (RF-RFE) is employed to search the optimal features from the CSP based features and g-gap dipeptide composition. Based on the optimal features, a Random Forest (RF) module is used to distinguish cis-Golgi proteins from trans-Golgi proteins. Through the jackknife cross-validation, the proposed method achieves a promising performance with a sensitivity of 0.889, a specificity of 0.880, an accuracy of 0.885, and a Matthew’s Correlation Coefficient (MCC) of 0.765, which remarkably outperforms previous methods. Moreover, when tested on a common independent dataset, our method also achieves a significantly improved performance. These results highlight the promising performance of the proposed method to identify Golgi-resident protein types. Furthermore, the CSP based feature extraction method may provide guidelines for protein function predictions.


PLOS ONE | 2016

Sequence Based Prediction of Antioxidant Proteins Using a Classifier Selection Strategy

Lina Zhang; Chengjin Zhang; Rui Gao; Runtao Yang; Qing Song

Antioxidant proteins perform significant functions in maintaining oxidation/antioxidation balance and have potential therapies for some diseases. Accurate identification of antioxidant proteins could contribute to revealing physiological processes of oxidation/antioxidation balance and developing novel antioxidation-based drugs. In this study, an ensemble method is presented to predict antioxidant proteins with hybrid features, incorporating SSI (Secondary Structure Information), PSSM (Position Specific Scoring Matrix), RSA (Relative Solvent Accessibility), and CTD (Composition, Transition, Distribution). The prediction results of the ensemble predictor are determined by an average of prediction results of multiple base classifiers. Based on a classifier selection strategy, we obtain an optimal ensemble classifier composed of RF (Random Forest), SMO (Sequential Minimal Optimization), NNA (Nearest Neighbor Algorithm), and J48 with an accuracy of 0.925. A Relief combined with IFS (Incremental Feature Selection) method is adopted to obtain optimal features from hybrid features. With the optimal features, the ensemble method achieves improved performance with a sensitivity of 0.95, a specificity of 0.93, an accuracy of 0.94, and an MCC (Matthew’s Correlation Coefficient) of 0.880, far better than the existing method. To evaluate the prediction performance objectively, the proposed method is compared with existing methods on the same independent testing dataset. Encouragingly, our method performs better than previous studies. In addition, our method achieves more balanced performance with a sensitivity of 0.878 and a specificity of 0.860. These results suggest that the proposed ensemble method can be a potential candidate for antioxidant protein prediction. For public access, we develop a user-friendly web server for antioxidant protein identification that is freely accessible at http://antioxidant.weka.cc.


International Journal of Molecular Sciences | 2015

An Effective Antifreeze Protein Predictor with Ensemble Classifiers and Comprehensive Sequence Descriptors

Runtao Yang; Chengjin Zhang; Rui Gao; Lina Zhang

Antifreeze proteins (AFPs) play a pivotal role in the antifreeze effect of overwintering organisms. They have a wide range of applications in numerous fields, such as improving the production of crops and the quality of frozen foods. Accurate identification of AFPs may provide important clues to decipher the underlying mechanisms of AFPs in ice-binding and to facilitate the selection of the most appropriate AFPs for several applications. Based on an ensemble learning technique, this study proposes an AFP identification system called AFP-Ensemble. In this system, random forest classifiers are trained by different training subsets and then aggregated into a consensus classifier by majority voting. The resulting predictor yields a sensitivity of 0.892, a specificity of 0.940, an accuracy of 0.938 and a balanced accuracy of 0.916 on an independent dataset, which are far better than the results obtained by previous methods. These results reveal that AFP-Ensemble is an effective and promising predictor for large-scale determination of AFPs. The detailed feature analysis in this study may give useful insights into the molecular mechanisms of AFP-ice interactions and provide guidance for the related experimental validation. A web server has been designed to implement the proposed method.


canadian conference on electrical and computer engineering | 2015

Incorporating g-gap dipeptide composition and position specific scoring matrix for identifying antioxidant proteins

Lina Zhang; Chengjin Zhang; Rui Gao; Runtao Yang

Oxidative stress can damage major cell components, including protein, DNA, lipid and cell membranes, which may make cells lose function and induce a wide variety of diseases. As an extensive kind of antioxidants in human and animals, antioxidant proteins are essential to eliminate cell damage and aging problems caused by oxidative stress. Accurate identification of antioxidant proteins is a significant step to reveal the inducement and physiological process of certain types of diseases and aging. Furthermore, newly identified antioxidant proteins may provide candidate targets for curing or alleviating diseases and slowing down the aging process. In this study, a random forest-based approach incorporating PSSM (Position Specific Scoring Matrix) and g-gap dipeptide composition is put forward to distinguish antioxidant proteins from non-antioxidant proteins. To further improve the prediction performance, the information gain combined with incremental feature selection is adopted to obtain optimal features. Compared with prior studies in testing dataset, the proposed method shows excellent predictive performance with accuracy of 0.807, MCC of 0.543, AUC of 0.939, respectively. It is indicated that this method may be an alternative perspective predictor for annotating antioxidant proteins.


BioMed Research International | 2015

JPPRED: Prediction of Types of J-Proteins from Imbalanced Data Using an Ensemble Learning Method

Lina Zhang; Chengjin Zhang; Rui Gao; Runtao Yang

Different types of J-proteins perform distinct functions in chaperone processes and diseases development. Accurate identification of types of J-proteins will provide significant clues to reveal the mechanism of J-proteins and contribute to developing drugs for diseases. In this study, an ensemble predictor called JPPRED for J-protein prediction is proposed with hybrid features, including split amino acid composition (SAAC), pseudo amino acid composition (PseAAC), and position specific scoring matrix (PSSM). To deal with the imbalanced benchmark dataset, the synthetic minority oversampling technique (SMOTE) and undersampling technique are applied. The average sensitivity of JPPRED based on above-mentioned individual feature spaces lies in the range of 0.744–0.851, indicating the discriminative power of these features. In addition, JPPRED yields the highest average sensitivity of 0.875 using the hybrid feature spaces of SAAC, PseAAC, and PSSM. Compared to individual base classifiers, JPPRED obtains more balanced and better performance for each type of J-proteins. To evaluate the prediction performance objectively, JPPRED is compared with previous study. Encouragingly, JPPRED obtains balanced performance for each type of J-proteins, which is significantly superior to that of the existing method. It is anticipated that JPPRED can be a potential candidate for J-protein prediction.


Scientific Reports | 2018

Using a Classifier Fusion Strategy to Identify Anti-angiogenic Peptides

Lina Zhang; Runtao Yang; Chengjin Zhang

Anti-angiogenic peptides perform distinct physiological functions and potential therapies for angiogenesis-related diseases. Accurate identification of anti-angiogenic peptides may provide significant clues to understand the essential angiogenic homeostasis within tissues and develop antineoplastic therapies. In this study, an ensemble predictor is proposed for anti-angiogenic peptide prediction by fusing an individual classifier with the best sensitivity and another individual one with the best specificity. We investigate predictive capabilities of various feature spaces with respect to the corresponding optimal individual classifiers and ensemble classifiers. The accuracy and Matthew’s Correlation Coefficient (MCC) of the ensemble classifier trained by Bi-profile Bayes (BpB) features are 0.822 and 0.649, respectively, which represents the highest prediction results among the investigated prediction models. Discriminative features are obtained from BpB using the Relief algorithm followed by the Incremental Feature Selection (IFS) method. The sensitivity, specificity, accuracy, and MCC of the ensemble classifier trained by the discriminative features reach up to 0.776, 0.888, 0.832, and 0.668, respectively. Experimental results indicate that the proposed method is far superior to the previous study for anti-angiogenic peptide prediction.

Collaboration


Dive into the Runtao Yang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge