Qing-Song Xu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Qing-Song Xu is active.

Explore More

Publication

Featured researches published by Qing-Song Xu.

Bioinformatics | 2013

propy: a tool to generate various modes of Chou’s PseAAC

Dong-Sheng Cao; Qing-Song Xu; Yi-Zeng Liang

SUMMARY Sequence-derived structural and physiochemical features have been frequently used for analysing and predicting structural, functional, expression and interaction profiles of proteins and peptides. To facilitate extensive studies of proteins and peptides, we developed a freely available, open source python package called protein in python (propy) for calculating the widely used structural and physicochemical features of proteins and peptides from amino acid sequence. It computes five feature groups composed of 13 features, including amino acid composition, dipeptide composition, tripeptide composition, normalized Moreau-Broto autocorrelation, Moran autocorrelation, Geary autocorrelation, sequence-order-coupling number, quasi-sequence-order descriptors, composition, transition and distribution of various structural and physicochemical properties and two types of pseudo amino acid composition (PseAAC) descriptors. These features could be generally regarded as different Chous PseAAC modes. In addition, it can also easily compute the previous descriptors based on user-defined properties, which are automatically available from the AAindex database. AVAILABILITY The python package, propy, is freely available via http://code.google.com/p/protpy/downloads/list, and it runs on Linux and MS-Windows. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Journal of Computational Chemistry | 2009

A new strategy of outlier detection for QSAR/QSPR.

Dong-Sheng Cao; Yi-Zeng Liang; Qing-Song Xu; Hong-Dong Li; Xian Chen

The crucial step of building a high performance QSAR/QSPR model is the detection of outliers in the model. Detecting outliers in a multivariate point cloud is not trivial, especially when several outliers coexist in the model. The classical identification methods do not always identify them, because they are based on the sample mean and covariance matrix influenced by the outliers. Moreover, existing methods only lay stress on some type of outliers but not all the outliers. To avoid these problems and detect all kinds of outliers simultaneously, we provide a new strategy based on Monte‐Carlo cross‐validation, which was termed as the MC method. The MC method inherently provides a feasible way to detect different kinds of outliers by establishment of many cross‐predictive models. With the help of the distribution of predictive residuals such obtained, it seems to be able to reduce the risk caused by the masking effect. In addition, a new display is proposed, in which the absolute values of mean value of predictive residuals are plotted versus standard deviations of predictive residuals. The plot divides the data into normal samples, y direction outliers and X direction outliers. Several examples are used to demonstrate the detection ability of MC method through the comparison of different diagnostic methods.

Analytica Chimica Acta | 2012

Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification.

Hong-Dong Li; Qing-Song Xu; Yi-Zeng Liang

The identification of disease-relevant genes represents a challenge in microarray-based disease diagnosis where the sample size is often limited. Among established methods, reversible jump Markov Chain Monte Carlo (RJMCMC) methods have proven to be quite promising for variable selection. However, the design and application of an RJMCMC algorithm requires, for example, special criteria for prior distributions. Also, the simulation from joint posterior distributions of models is computationally extensive, and may even be mathematically intractable. These disadvantages may limit the applications of RJMCMC algorithms. Therefore, the development of algorithms that possess the advantages of RJMCMC methods and are also efficient and easy to follow for selecting disease-associated genes is required. Here we report a RJMCMC-like method, called random frog that possesses the advantages of RJMCMC methods and is much easier to implement. Using the colon and the estrogen gene expression datasets, we show that random frog is effective in identifying discriminating genes. The top 2 ranked genes for colon and estrogen are Z50753, U00968, and Y10871_at, Z22536_at, respectively. (The source codes with GNU General Public License Version 2.0 are freely available to non-commercial users at: http://code.google.com/p/randomfrog/.).

Bioinformatics | 2013

ChemoPy: freely available python package for computational biology and chemoinformatics

Dong-Sheng Cao; Qing-Song Xu; Qian-Nan Hu; Yi-Zeng Liang

MOTIVATION Molecular representation for small molecules has been routinely used in QSAR/SAR, virtual screening, database search, ranking, drug ADME/T prediction and other drug discovery processes. To facilitate extensive studies of drug molecules, we developed a freely available, open-source python package called chemoinformatics in python (ChemoPy) for calculating the commonly used structural and physicochemical features. It computes 16 drug feature groups composed of 19 descriptors that include 1135 descriptor values. In addition, it provides seven types of molecular fingerprint systems for drug molecules, including topological fingerprints, electro-topological state (E-state) fingerprints, MACCS keys, FP4 keys, atom pairs fingerprints, topological torsion fingerprints and Morgan/circular fingerprints. By applying a semi-empirical quantum chemistry program MOPAC, ChemoPy can also compute a large number of 3D molecular descriptors conveniently. AVAILABILITY The python package, ChemoPy, is freely available via http://code.google.com/p/pychem/downloads/list, and it runs on Linux and MS-Windows. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Metabolomics | 2010

Recipe for revealing informative metabolites based on model population analysis

Hong-Dong Li; Mao-Mao Zeng; Bin-Bin Tan; Yi-Zeng Liang; Qing-Song Xu; Dong-Sheng Cao

An important application of metabolic profiles is to discover informative metabolites/biomarkers which are predictive of a clinical outcome under investigation. Therefore, there is a need to develop statistically efficient method for screening such kind of metabolites from the candidates. The most commonly used criteria to assess variable (metabolite) importance may be the P value obtained by performing t test on each metabolite alone, without considering the influence of other variables. In this work, a new strategy, called subwindow permutation analysis (SPA) coupled with partial least squares linear discriminant analysis (PLSLDA), is developed for statistical assessment of variable importance. The main contribution of SPA is that, unlike t test, it can output a conditional P value by implicitly taking into account the synergetic effect of all the other variables. In this sense, the conditional P value could to some extent help locate a good combination of informative variables. When applied to two metabolic datasets (type 2 diabetes mellitus data and childhood overweight data), it is shown that the performance of both the unsupervised principal component analysis (PCA) and the supervised PLSLDA are greatly improved when using the informative metabolites revealed by SPA. The source codes for implementing SPA in both MATLAB and R (R package for both Linux and Windows) are freely available at: http://code.google.com/p/spa2010/downloads/list.

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy | 2013

An efficient method of wavelength interval selection based on random frog for multivariate spectral calibration.

Yong-Huan Yun; Hong-Dong Li; Leslie R. E. Wood; Wei Fan; Jia-Jun Wang; Dong-Sheng Cao; Qing-Song Xu; Yi-Zeng Liang

Wavelength selection is a critical step for producing better prediction performance when applied to spectral data. Considering the fact that the vibrational and rotational spectra have continuous features of spectral bands, we propose a novel method of wavelength interval selection based on random frog, called interval random frog (iRF). To obtain all the possible continuous intervals, spectra are first divided into intervals by moving window of a fix width over the whole spectra. These overlapping intervals are ranked applying random frog coupled with PLS and the optimal ones are chosen. This method has been applied to two near-infrared spectral datasets displaying higher efficiency in wavelength interval selection than others. The source code of iRF can be freely downloaded for academy research at the website: http://code.google.com/p/multivariate-calibration/downloads/list.

Analytica Chimica Acta | 2014

A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration

Yong-Huan Yun; Wei-Ting Wang; Min-Li Tan; Yi-Zeng Liang; Hong-Dong Li; Dong-Sheng Cao; Hongmei Lu; Qing-Song Xu

Nowadays, with a high dimensionality of dataset, it faces a great challenge in the creation of effective methods which can select an optimal variables subset. In this study, a strategy that considers the possible interaction effect among variables through random combinations was proposed, called iteratively retaining informative variables (IRIV). Moreover, the variables are classified into four categories as strongly informative, weakly informative, uninformative and interfering variables. On this basis, IRIV retains both the strongly and weakly informative variables in every iterative round until no uninformative and interfering variables exist. Three datasets were employed to investigate the performance of IRIV coupled with partial least squares (PLS). The results show that IRIV is a good alternative for variable selection strategy when compared with three outstanding and frequently used variable selection methods such as genetic algorithm-PLS, Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS) and competitive adaptive reweighted sampling (CARS). The MATLAB source code of IRIV can be freely downloaded for academy research at the website: http://code.google.com/p/multivariate-calibration/downloads/list.

Bioinformatics | 2015

protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences

Nan Xiao; Dong-Sheng Cao; Min-Feng Zhu; Qing-Song Xu

UNLABELLED Amino acid sequence-derived structural and physiochemical descriptors are extensively utilized for the research of structural, functional, expression and interaction profiles of proteins and peptides. We developed protr, a comprehensive R package for generating various numerical representation schemes of proteins and peptides from amino acid sequence. The package calculates eight descriptor groups composed of 22 types of commonly used descriptors that include about 22 700 descriptor values. It allows users to select amino acid properties from the AAindex database, and use self-defined properties to construct customized descriptors. For proteochemometric modeling, it calculates six types of scales-based descriptors derived by various dimensionality reduction methods. The protr package also integrates the functionality of similarity score computation derived by protein sequence alignment and Gene Ontology semantic similarity measures within a list of proteins, and calculates profile-based protein features based on position-specific scoring matrix. We also developed ProtrWeb, a user-friendly web server for calculating descriptors presented in the protr package. AVAILABILITY AND IMPLEMENTATION The protr package is freely available from CRAN: http://cran.r-project.org/package=protr, ProtrWeb, is freely available at http://protrweb.scbdd.com/.

Analytica Chimica Acta | 2012

Large-scale prediction of drug-target interactions using protein sequences and drug topological structures

Dong-Sheng Cao; Shao Liu; Qing-Song Xu; Hongmei Lu; Jian-Hua Huang; Qian-Nan Hu; Yi-Zeng Liang

The identification of interactions between drugs and target proteins plays a key role in the process of genomic drug discovery. It is both consuming and costly to determine drug-target interactions by experiments alone. Therefore, there is an urgent need to develop new in silico prediction approaches capable of identifying these potential drug-target interactions in a timely manner. In this article, we aim at extending current structure-activity relationship (SAR) methodology to fulfill such requirements. In some sense, a drug-target interaction can be regarded as an event or property triggered by many influence factors from drugs and target proteins. Thus, each interaction pair can be represented theoretically by using these factors which are based on the structural and physicochemical properties simultaneously from drugs and proteins. To realize this, drug molecules are encoded with MACCS substructure fingerings representing existence of certain functional groups or fragments; and proteins are encoded with some biochemical and physicochemical properties. Four classes of drug-target interaction networks in humans involving enzymes, ion channels, G-protein-coupled receptors (GPCRs) and nuclear receptors, are independently used for establishing predictive models with support vector machines (SVMs). The SVM models gave prediction accuracy of 90.31%, 88.91%, 84.68% and 83.74% for four datasets, respectively. In conclusion, the results demonstrate the ability of our proposed method to predict the drug-target interactions, and show a general compatibility between the new scheme and current SAR methodology. They open the way to a host of new investigations on the diversity analysis and prediction of drug-target interactions.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2011

Recipe for Uncovering Predictive Genes Using Support Vector Machines Based on Model Population Analysis

Hong-Dong Li; Yi-Zeng Liang; Qing-Song Xu; Dong-Sheng Cao; Bin-Bin Tan; Baichuan Deng; Chen-Chen Lin

Selecting a small number of informative genes for microarray-based tumor classification is central to cancer prediction and treatment. Based on model population analysis, here we present a new approach, called Margin Influence Analysis (MIA), designed to work with support vector machines (SVM) for selecting informative genes. The rationale for performing margin influence analysis lies in the fact that the margin of support vector machines is an important factor which underlies the generalization performance of SVM models. Briefly, MIA could reveal genes which have statistically significant influence on the margin by using Mann-Whitney U test. The reason for using the Mann-Whitney U test rather than two-sample t test is that Mann-Whitney U test is a nonparametric test method without any distribution-related assumptions and is also a robust method. Using two publicly available cancerous microarray data sets, it is demonstrated that MIA could typically select a small number of margin-influencing genes and further achieves comparable classification accuracy compared to those reported in the literature. The distinguished features and outstanding performance may make MIA a good alternative for gene selection of high dimensional microarray data. (The source code in MATLAB with GNU General Public License Version 2.0 is freely available at http://code.google.eom/p/mia2009/).

Explore More