Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xiaowei Zhao is active.

Publication


Featured researches published by Xiaowei Zhao.


International Journal of Molecular Sciences | 2014

PSNO: Predicting Cysteine S-Nitrosylation Sites by Incorporating Various Sequence-Derived Features into the General Form of Chou’s PseAAC

Jian Zhang; Xiaowei Zhao; Pingping Sun; Zhiqiang Ma

S-nitrosylation (SNO) is one of the most universal reversible post-translational modifications involved in many biological processes. Malfunction or dysregulation of SNO leads to a series of severe diseases, such as developmental abnormalities and various diseases. Therefore, the identification of SNO sites (SNOs) provides insights into disease progression and drug development. In this paper, a new bioinformatics tool, named PSNO, is proposed to identify SNOs from protein sequences. Firstly, we explore various promising sequence-derived discriminative features, including the evolutionary profile, the predicted secondary structure and the physicochemical properties. Secondly, rather than simply combining the features, which may bring about information redundancy and unwanted noise, we use the relative entropy selection and incremental feature selection approach to select the optimal feature subsets. Thirdly, we train our model by the technique of the k-nearest neighbor algorithm. Using both informative features and an elaborate feature selection scheme, our method, PSNO, achieves good prediction performance with a mean Mathews correlation coefficient (MCC) value of about 0.5119 on the training dataset using 10-fold cross-validation. These results indicate that PSNO can be used as a competitive predictor among the state-of-the-art SNOs prediction tools. A web-server, named PSNO, which implements the proposed method, is freely available at http://59.73.198.144:8088/PSNO/.


Computational and Mathematical Methods in Medicine | 2013

Bioinformatics resources and tools for conformational B-cell epitope prediction.

Pingping Sun; Haixu Ju; Zhenbang Liu; Qiao Ning; Jian Zhang; Xiaowei Zhao; Yanxin Huang; Zhiqiang Ma; Yuxin Li

Identification of epitopes which invoke strong humoral responses is an essential issue in the field of immunology. Localizing epitopes by experimental methods is expensive in terms of time, cost, and effort; therefore, computational methods feature for its low cost and high speed was employed to predict B-cell epitopes. In this paper, we review the recent advance of bioinformatics resources and tools in conformational B-cell epitope prediction, including databases, algorithms, web servers, and their applications in solving problems in related areas. To stimulate the development of better tools, some promising directions are also extensively discussed.


Journal of Theoretical Biology | 2015

Accurate in silico identification of protein succinylation sites using an iterative semi-supervised learning technique

Xiaowei Zhao; Qiao Ning; Haiting Chai; Zhiqiang Ma

As a widespread type of protein post-translational modifications (PTMs), succinylation plays an important role in regulating protein conformation, function and physicochemical properties. Compared with the labor-intensive and time-consuming experimental approaches, computational predictions of succinylation sites are much desirable due to their convenient and fast speed. Currently, numerous computational models have been developed to identify PTMs sites through various types of two-class machine learning algorithms. These methods require both positive and negative samples for training. However, designation of the negative samples of PTMs was difficult and if it is not properly done can affect the performance of computational models dramatically. So that in this work, we implemented the first application of positive samples only learning (PSoL) algorithm to succinylation sites prediction problem, which was a special class of semi-supervised machine learning that used positive samples and unlabeled samples to train the model. Meanwhile, we proposed a novel succinylation sites computational predictor called SucPred (succinylation site predictor) by using multiple feature encoding schemes. Promising results were obtained by the SucPred predictor with an accuracy of 88.65% using 5-fold cross validation on the training dataset and an accuracy of 84.40% on the independent testing dataset, which demonstrated that the positive samples only learning algorithm presented here was particularly useful for identification of protein succinylation sites. Besides, the positive samples only learning algorithm can be applied to build predictors for other types of PTMs sites with ease. A web server for predicting succinylation sites was developed and was freely accessible at http://59.73.198.144:8088/SucPred/.


Journal of Theoretical Biology | 2014

PECM: Prediction of extracellular matrix proteins using the concept of Chou’s pseudo amino acid composition

Jian Zhang; Pingping Sun; Xiaowei Zhao; Zhiqiang Ma

The extracellular matrix proteins (ECMs) are widely found in the tissues of multicellular organisms. They consist of various secreted proteins, mainly polysaccharides and glycoproteins. The ECMs involve the exchange of materials and information between resident cells and the external environment. Accurate identification of ECMs is a significant step in understanding the evolution of cancer as well as promises wide range of potential applications in therapeutic targets or diagnostic markers. In this paper, an accurate computational method named PECM is proposed for identifying ECMs. Here, we explore various sequence-derived discriminative features including evolutionary information, predicted secondary structure, and physicochemical properties. Rather than simply combining the features which may bring information redundancy and unwanted noises, we use Fisher-Markov selector and incremental feature selection approach to search the optimal feature subsets. Then, we train our model by the technique of support vector machine (SVM). PECM achieves good prediction performance with the ACC scores about 86% and 90% on testing and independent datasets, which are competitive with the state-of-the-art ECMs prediction tools. A web-server named PECM which implements the proposed approach is freely available at http://59.73.198.144:8088/PECM/.


BioMed Research International | 2013

Position-Specific Analysis and Prediction of Protein Pupylation Sites Based on Multiple Features

Xiaowei Zhao; Jiangyan Dai; Qiao Ning; Zhiqiang Ma; Minghao Yin; Pingping Sun

Pupylation is one of the most important posttranslational modifications of proteins; accurate identification of pupylation sites will facilitate the understanding of the molecular mechanism of pupylation. Besides the conventional experimental approaches, computational prediction of pupylation sites is much desirable for their convenience and fast speed. In this study, we developed a novel predictor to predict the pupylation sites. First, the maximum relevance minimum redundancy (mRMR) and incremental feature selection methods were made on five kinds of features to select the optimal feature set. Then the prediction model was built based on the optimal feature set with the assistant of the support vector machine algorithm. As a result, the overall jackknife success rate by the new predictor on a newly constructed benchmark dataset was 0.764, and the Mathews correlation coefficient was 0.522, indicating a good prediction. Feature analysis showed that all features types contributed to the prediction of protein pupylation sites. Further site-specific features analysis revealed that the features of sites surrounding the central lysine contributed more to the determination of pupylation sites than the other sites.


BioMed Research International | 2014

Conformational B-cell epitopes prediction from sequences using cost-sensitive ensemble classifiers and spatial clustering.

Jian Zhang; Xiaowei Zhao; Pingping Sun; Bo Gao; Zhiqiang Ma

B-cell epitopes are regions of the antigen surface which can be recognized by certain antibodies and elicit the immune response. Identification of epitopes for a given antigen chain finds vital applications in vaccine and drug research. Experimental prediction of B-cell epitopes is time-consuming and resource intensive, which may benefit from the computational approaches to identify B-cell epitopes. In this paper, a novel cost-sensitive ensemble algorithm is proposed for predicting the antigenic determinant residues and then a spatial clustering algorithm is adopted to identify the potential epitopes. Firstly, we explore various discriminative features from primary sequences. Secondly, cost-sensitive ensemble scheme is introduced to deal with imbalanced learning problem. Thirdly, we adopt spatial algorithm to tell which residues may potentially form the epitopes. Based on the strategies mentioned above, a new predictor, called CBEP (conformational B-cell epitopes prediction), is proposed in this study. CBEP achieves good prediction performance with the mean AUC scores (AUCs) of 0.721 and 0.703 on two benchmark datasets (bound and unbound) using the leave-one-out cross-validation (LOOCV). When compared with previous prediction tools, CBEP produces higher sensitivity and comparable specificity values. A web server named CBEP which implements the proposed method is available for academic use.


Biodata Mining | 2015

Prediction of protein solvent accessibility using PSO-SVR with multiple sequence-derived features and weighted sliding window scheme

Jian Zhang; Wenhan Chen; Pingping Sun; Xiaowei Zhao; Zhiqiang Ma

BackgroundThe prediction of solvent accessibility could provide valuable clues for analyzing protein structure and functions, such as protein 3-Dimensional structure and B-cell epitope prediction. To fully decipher the protein-protein interaction process, an initial but crucial step is to calculate the protein solvent accessibility, especially when the tertiary structure of the protein is unknown. Although some efforts have been put into the protein solvent accessibility prediction, the performance of existing methods is far from satisfaction.MethodsIn order to develop the high-accuracy model, we focus on some possible aspects concerning the prediction performance, including several sequence-derived features, a weighted sliding window scheme and the parameters optimization of machine learning approach. To address above issues, we take following strategies. Firstly, we explore various features which have been observed to be associated with the residue solvent accessibility. These discriminative features include protein evolutionary information, predicted protein secondary structure, native disorder, physicochemical propensities and several sequence-based structural descriptors of residues. Secondly, the different contributions of adjacent residues in sliding window are observed, thus a weighted sliding window scheme is proposed to differentiate the contributions of adjacent residues on the central residue. Thirdly, particle swarm optimization (PSO) is employed to search the global best parameters for the proposed predictor.ResultsEvaluated by 3-fold cross-validation, our method achieves the mean absolute error (MAE) of 14.1% and the person correlation coefficient (PCC) of 0.75 for our new-compiled dataset. When compared with the state-of-the-art prediction models in the two benchmark datasets, our method demonstrates better performance. Experimental results demonstrate that our PSAP achieves high performances and outperforms many existing predictors. A web server called PSAP is built and freely available at http://59.73.198.144:8088/SolventAccessibility/.


Journal of Theoretical Biology | 2015

PGlcS: Prediction of protein O-GlcNAcylation sites with multiple features and analysis

Xiaowei Zhao; Qiao Ning; Haiting Chai; Meiyue Ai; Zhiqiang Ma

As a widespread type of protein post-translational modification, O-GlcNAcylation plays crucial regulatory roles in almost all cellular processes and is related to some diseases. To deeply understand O-GlcNAcylated mechanisms, identification of substrates and specific O-GlcNAcylated sites is crucial. Experimental identification is expensive and time-consuming, so computational prediction of O-GlcNAcylated sites has considerable value. In this work, we developed a novel O-GlcNAcylated sites predictor called PGlcS (Prediction of O-GlcNAcylated Sites) by using k-means cluster to obtain informative and reliable negative samples, and support vector machines classifier combined with a two-step feature selection. The performance of PGlcS was evaluated using an independent testing dataset resulting in a sensitivity of 64.62%, a specificity of 68.4%, an accuracy of 68.37%, and a Matthew׳s correlation coefficient of 0.0697, which demonstrated PGlcS was very promising for predicting O-GlcNAcylated sites. The datasets and source code were available in Supplementary information.


Molecules | 2017

EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites

Xuanguo Nan; Lingling Bao; Xiaosa Zhao; Xiaowei Zhao; Arun Kumar Sangaiah; Gai-Ge Wang; Zhiqiang Ma

Protein pupylation is a type of post-translation modification, which plays a crucial role in cellular function of bacterial organisms in prokaryotes. To have a better insight of the mechanisms underlying pupylation an initial, but important, step is to identify pupylation sites. To date, several computational methods have been established for the prediction of pupylation sites which usually artificially design the negative samples using the verified pupylation proteins to train the classifiers. However, if this process is not properly done it can affect the performance of the final predictor dramatically. In this work, different from previous computational methods, we proposed an enhanced positive-unlabeled learning algorithm (EPuL) to the pupylation site prediction problem, which uses only positive and unlabeled samples. Firstly, we separate the training dataset into the positive dataset and the unlabeled dataset which contains the remaining non-annotated lysine residues. Then, the EPuL algorithm is utilized to select the reliably negative initial dataset and then iteratively pick out the non-pupylation sites. The performance of the proposed method was measured with an accuracy of 90.24%, an Area Under Curve (AUC) of 0.93 and an MCC of 0.81 by 10-fold cross-validation. A user-friendly web server for predicting pupylation sites was developed and was freely available at http://59.73.198.144:8080/EPuL.


Molecules | 2017

Glypre: In Silico Prediction of Protein Glycation Sites by Fusing Multiple Features and Support Vector Machine

Xiaowei Zhao; Xiaosa Zhao; Lingling Bao; Yonggang Zhang; Jiangyan Dai; Minghao Yin

Glycation is a non-enzymatic process occurring inside or outside the host body by attaching a sugar molecule to a protein or lipid molecule. It is an important form of post-translational modification (PTM), which impairs the function and changes the characteristics of the proteins so that the identification of the glycation sites may provide some useful guidelines to understand various biological functions of proteins. In this study, we proposed an accurate prediction tool, named Glypre, for lysine glycation. Firstly, we used multiple informative features to encode the peptides. These features included the position scoring function, secondary structure, AAindex, and the composition of k-spaced amino acid pairs. Secondly, the distribution of distinctive features of the residues surrounding the glycation and non-glycation sites was statistically analysed. Thirdly, based on the distribution of these features, we developed a new predictor by using different optimal window sizes for different properties and a two-step feature selection method, which utilized the maximum relevance minimum redundancy method followed by a greedy feature selection procedure. The performance of Glypre was measured with a sensitivity of 57.47%, a specificity of 90.78%, an accuracy of 79.68%, area under the receiver-operating characteristic (ROC) curve (AUC) of 0.86, and a Matthews’s correlation coefficient (MCC) of 0.52 by 10-fold cross-validation. The detailed analysis results showed that our predictor may play a complementary role to other existing methods for identifying protein lysine glycation. The source code and datasets of the Glypre are available in the Supplementary File.

Collaboration


Dive into the Xiaowei Zhao's collaboration.

Top Co-Authors

Avatar

Zhiqiang Ma

Northeast Normal University

View shared research outputs
Top Co-Authors

Avatar

Qiao Ning

Northeast Normal University

View shared research outputs
Top Co-Authors

Avatar

Jian Zhang

Northeast Normal University

View shared research outputs
Top Co-Authors

Avatar

Lingling Bao

Northeast Normal University

View shared research outputs
Top Co-Authors

Avatar

Xiaosa Zhao

Northeast Normal University

View shared research outputs
Top Co-Authors

Avatar

Haiting Chai

Northeast Normal University

View shared research outputs
Top Co-Authors

Avatar

Pingping Sun

Northeast Normal University

View shared research outputs
Top Co-Authors

Avatar

Meiyue Ai

Northeast Normal University

View shared research outputs
Top Co-Authors

Avatar

Bo Gao

Northeast Normal University

View shared research outputs
Top Co-Authors

Avatar

Gai-Ge Wang

Jiangsu Normal University

View shared research outputs
Researchain Logo
Decentralizing Knowledge