Bing Niu
Shanghai University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bing Niu.
Molecular Diversity | 2008
Bing Niu; Yuhuan Jin; Kai-Yan Feng; Wencong Lu; Yu-Dong Cai; Guo-Zheng Li
In this paper, AdaBoost algorithm, a popular and effective prediction method, is applied to predict the subcellular locations of Prokaryotic and Eukaryotic Proteins—a dataset derived from SWISSPROT 33.0. Its prediction ability was evaluated by re-substitution test, Leave-One-Out Cross validation (LOOCV) and jackknife test. By comparing its results with some most popular predictors such as Discriminant Function, neural networks, and SVM, we demonstrated that the AdaBoost predictor outperformed these predictors. As a result, we arrive at the conclusion that AdaBoost algorithm could be employed as a robust method to predict subcellular location. An online web server for predicting subcellular location of prokaryotic and eukaryotic proteins is available at http://chemdata.shu.edu.cn/subcell/.
Biochemical and Biophysical Research Communications | 2009
Liang Liu; Yu-Dong Cai; Wencong Lu; Kai-Yan Feng; Chunrong Peng; Bing Niu
Based on pseudo amino acid (PseAA) composition and a novel hybrid feature selection frame, this paper presents a computational system to predict the PPIs (protein-protein interactions) using 8796 protein pairs. These pairs are coded by PseAA composition, resulting in 114 features. A hybrid feature selection system, mRMR-KNNs-wrapper, is applied to obtain an optimized feature set by excluding poor-performed and/or redundant features, resulting in 103 remaining features. Using the optimized 103-feature subset, a prediction model is trained and tested in the k-nearest neighbors (KNNs) learning system. This prediction model achieves an overall accurate prediction rate of 76.18%, evaluated by 10-fold cross-validation test, which is 1.46% higher than using the initial 114 features and is 6.51% higher than the 20 features, coded by amino acid compositions. The PPIs predictor, developed for this research, is available for public use at http://chemdata.shu.edu.cn/ppi.
Protein and Peptide Letters | 2008
Wencong Lu; Yuhuan Jin; Bing Niu; Kai-Yan Feng; Yu-Dong Cai; Guo-Zheng Li
Protein subcellular localization, which tells where a protein resides in a cell, is an important characteristic of a protein, and relates closely to the function of proteins. The prediction of their subcellular localization plays an important role in the prediction of protein function, genome annotation and drug design. Therefore, it is an important and challenging role to predict subcellular localization using bio-informatics approach. In this paper, a robust predictor, AdaBoost Learner is introduced to predict protein subcellular localization based on its amino acid composition. Jackknife cross-validation and independent dataset test were used to demonstrate that Adaboost is a robust and efficient model in predicting protein subcellular localization. As a result, the correct prediction rates were 74.98% and 80.12% for the Jackknife test and independent dataset test respectively, which are higher than using other existing predictors. An online server for predicting subcellular localization of proteins based on AdaBoost classifier was available on http://chemdata.shu. edu.cn/sl12.
Oncotarget | 2017
Qiang Su; Wencong Lu; Dongshu Du; Fuxue Chen; Bing Niu; Kuo-Chen Chou
Toxicity evaluation is an extremely important process during drug development. It is usually initiated by experiments on animals, which is time-consuming and costly. To speed up such a process, a quantitative structure-activity relationship (QSAR) study was performed to develop a computational model for correlating the structures of 581 aromatic compounds with their aquatic toxicity to tetrahymena pyriformis. A set of 68 molecular descriptors derived solely from the structures of the aromatic compounds were calculated based on Gaussian 03, HyperChem 7.5, and TSAR V3.3. A comprehensive feature selection method, minimum Redundancy Maximum Relevance (mRMR)-genetic algorithm (GA)-support vector regression (SVR) method, was applied to select the best descriptor subset in QSAR analysis. The SVR method was employed to model the toxicity potency from a training set of 500 compounds. Five-fold cross-validation method was used to optimize the parameters of SVR model. The new SVR model was tested on an independent dataset of 81 compounds. Both high internal consistent and external predictive rates were obtained, indicating the SVR model is very promising to become an effective tool for fast detecting the toxicity.
Protein and Peptide Letters | 2008
Bing Niu; Yu-Huan Yu-Huan Jin; Kai-Yan Feng; Liang Liu; Wencong Lu; Yu-Dong Cai; Guo-Zheng Li
The membrane protein type is an important feature in characterizing the overall topological folding type of a protein or its domains therein. Many investigators have put their efforts to the prediction of membrane protein type. Here, we propose a new approach, the bootstrap aggregating method or bragging learner, to address this problem based on the protein amino acid composition. As a demonstration, the benchmark dataset constructed by K.C. Chou and D.W. Elrod was used to test the new method. The overall success rate thus obtained by jackknife cross-validation was over 84%, indicating that the bragging learner as presented in this paper holds a quite high potential in predicting the attributes of proteins, or at least can play a complementary role to many existing algorithms in this area. It is anticipated that the prediction quality can be further enhanced if the pseudo amino acid composition can be effectively incorporated into the current predictor. An online membrane protein type prediction web server developed in our lab is available at http://chemdata.shu.edu.cn/protein/protein.jsp.
Journal of Computational Chemistry | 2009
Bing Niu; Lin Lu; Liang Liu; Tian Hong Gu; Kai-Yan Feng; Wencong Lu; Yu-Dong Cai
Knowledge of the polyprotein cleavage sites by HIV protease will refine our understanding of its specificity, and the information thus acquired is useful for designing specific and efficient HIV protease inhibitors. Recently, several works have approached the HIV‐1 protease specificity problem by applying a number of classifier creation and combination methods. The pace in searching for the proper inhibitors of HIV protease will be greatly expedited if one can find an accurate, robust, and rapid method for predicting the cleavage sites in proteins by HIV protease. In this article, we selected HIV‐1 protease as the subject of the study. 299 oligopeptides were chosen for the training set, while the other 63 oligopeptides were taken as a test set. The peptides are represented by features constructed by AAIndex (Kawashima et al., Nucleic Acids Res 1999, 27, 368; Kawashima and Kanehisa, Nucleic Acids Res 2000, 28, 374). The mRMR method (Maximum Relevance, Minimum Redundancy; Ding and Peng, Proc Second IEEE Comput Syst Bioinformatics Conf 2003, 523; Peng et al., IEEE Trans Pattern Anal Mach Intell 2005, 27, 1226) combining with incremental feature selection (IFS) and feature forward search (FFS) are applied to find the two important cleavage sites and to select 364 important biochemistry features by jackknife test. Using KNN (K‐nearest neighbors) to combine the selected features, the prediction model obtains high accuracy rate of 91.3% for Jackknife cross‐validation test and 87.3% for independent‐set test. It is expected that our feature selection scheme can be referred to as a useful assistant technique for finding effective inhibitors of HIV protease, especially for the scientists in this field.
BioMed Research International | 2011
Chunrong Peng; Liu Han-xia Liu; Bing Niu; Ying-fang Lv; Minjie Li; Youlang Yuan; Yongheng Zhu; Wencong Lu; Yu-Dong Cai
It is important to identify which proteins can interact with RNA for the purpose of protein annotation, since interactions between RNA and proteins influence the structure of the ribosome and play important roles in gene expression. This paper tries to identify proteins that can interact with RNA using voting systems. Firstly through Weka, 34 learning algorithms are chosen for investigation. Then simple majority voting system (SMVS) is used for the prediction of RNA-binding proteins, achieving average ACC (overall prediction accuracy) value of 79.72% and MCC (Matthews correlation coefficient) value of 59.77% for the independent testing dataset. Then mRMR (minimum redundancy maximum relevance) strategy is used, which is transferred into algorithm selection. In addition, the MCC value of each classifier is assigned to be the weight of the classifiers vote. As a result, best average MCC values are attained when 22 algorithms are selected and integrated through weighted votes, which are 64.70% for the independent testing dataset, and ACC value is 82.04% at this moment.
Molecular Diversity | 2008
Yu-Dong Cai; Ziliang Qian; Lin Lu; Kai-Yan Feng; Xin Meng; Bing Niu; Guo-Dong Zhao; Wencong Lu
Efficient in silico screening approaches may provide valuable hints on biological functions of the compound-candidates, which could help to screen functional compounds either in basic researches on metabolic pathways or drug discovery. Here, we introduce a machine learning method (Nearest Neighbor Algorithm) based on functional group composition of compounds to the analysis of metabolic pathways. This method can quickly map small chemical molecules to the metabolic pathway that they likely belong to. A set of 2,764 compounds from 11 major classes of metabolic pathways were selected for study. The overall prediction rate reached 73.3%, indicating that functional group composition of compounds was really related to their biological metabolic functions.
Molecular Diversity | 2009
Bing Niu; Yuhuan Jin; Lin Lu; Kaiyan Fen; Lei Gu; Zhisong He; Wencong Lu; Yixue Li; Yu-Dong Cai
The knowledge of whether one enzyme can interact with a small molecule is essential for understanding the molecular and cellular functions of organisms. In this paper, we introduce a classifier to predict the small molecule– enzyme interaction, i.e., whether they can interact with each other. Small molecules are represented by their chemical functional groups, and enzymes are represented by their biochemical and physicochemical properties, resulting in a total of 160 features. These features are input into the AdaBoost classifier, which is known to have good generalization ability to predict interaction. As a result, the overall prediction accuracy, tested by tenfold cross-validation and independent sets, is 81.76% and 83.35%, respectively, suggesting that this strategy is effective. In this research, we typically choose interactions between small molecules and enzymes involved in metabolism to ultimately improve further understanding of metabolic pathways. An online predictor developed by this research is available at http://chemdata.shu.edu.cn/small_m.
Medicinal Chemistry | 2012
Bing Niu; Qiang Su; Xiaochen Yuan; Wencong Lu; Juan Ding
QSAR study on a data set of 5-lipoxygenase inhibitors (1-phenyl [2H]-tetrahydro-triazine-3-one analogues) was carried out by using Support Vector Regression (SVR) and physicochemical parameters. Wrapper methods were used to select descriptors, while Leave-One-Out Cross Validation (LOOCV) method and independent set test were used to judge the predictive power of different models. We found out that the generalization ability of SVR model outperformed multiple linear regression (MLR) and Partial Least Squares (PLS) models in this work. An online web server for activity prediction is available at http://chemdata.shu.edu.cn/qsar5lip.