Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xiaolong Wang is active.

Publication


Featured researches published by Xiaolong Wang.


Nucleic Acids Research | 2015

Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences

Bin Liu; Fule Liu; Xiaolong Wang; Junjie Chen; Longyun Fang; Kuo-Chen Chou

With the avalanche of biological sequences generated in the post-genomic age, one of the most challenging problems in computational biology is how to effectively formulate the sequence of a biological sample (such as DNA, RNA or protein) with a discrete model or a vector that can effectively reflect its sequence pattern information or capture its key features concerned. Although several web servers and stand-alone tools were developed to address this problem, all these tools, however, can only handle one type of samples. Furthermore, the number of their built-in properties is limited, and hence it is often difficult for users to formulate the biological sequences according to their desired features or properties. In this article, with a much larger number of built-in properties, we are to propose a much more flexible web server called Pse-in-One (http://bioinformatics.hitsz.edu.cn/Pse-in-One/), which can, through its 28 different modes, generate nearly all the possible feature vectors for DNA, RNA and protein sequences. Particularly, it can also generate those feature vectors with the properties defined by users themselves. These feature vectors can be easily combined with machine-learning algorithms to develop computational predictors and analysis methods for various tasks in bioinformatics and system biology. It is anticipated that the Pse-in-One web server will become a very useful tool in computational proteomics, genomics, as well as biological sequence analysis. Moreover, to maximize users’ convenience, its stand-alone version can also be downloaded from http://bioinformatics.hitsz.edu.cn/Pse-in-One/download/, and directly run on Windows, Linux, Unix and Mac OS.


Bioinformatics | 2014

Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection

Bin Liu; Deyuan Zhang; Ruifeng Xu; Jinghao Xu; Xiaolong Wang; Qingcai Chen; Qiwen Dong; Kuo-Chen Chou

Abstract Motivation: Owing to its importance in both basic research (such as molecular evolution and protein attribute prediction) and practical application (such as timely modeling the 3D structures of proteins targeted for drug development), protein remote homology detection has attracted a great deal of interest. It is intriguing to note that the profile-based approach is promising and holds high potential in this regard. To further improve protein remote homology detection, a key step is how to find an optimal means to extract the evolutionary information into the profiles. Results: Here, we propose a novel approach, the so-called profile-based protein representation, to extract the evolutionary information via the frequency profiles. The latter can be calculated from the multiple sequence alignments generated by PSI-BLAST. Three top performing sequence-based kernels (SVM-Ngram, SVM-pairwise and SVM-LA) were combined with the profile-based protein representation. Various tests were conducted on a SCOP benchmark dataset that contains 54 families and 23 superfamilies. The results showed that the new approach is promising, and can obviously improve the performance of the three kernels. Furthermore, our approach can also provide useful insights for studying the features of proteins in various families. It has not escaped our notice that the current approach can be easily combined with the existing sequence-based methods so as to improve their performance as well. Availability and implementation: For users’ convenience, the source code of generating the profile-based proteins and the multiple kernel learning was also provided at http://bioinformatics.hitsz.edu.cn/main/∼binliu/remote/ Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


PLOS ONE | 2014

iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.

Bin Liu; Jinghao Xu; Xun Lan; Ruifeng Xu; Jiyun Zhou; Xiaolong Wang; Kuo-Chen Chou

Playing crucial roles in various cellular processes, such as recognition of specific nucleotide sequences, regulation of transcription, and regulation of gene expression, DNA-binding proteins are essential ingredients for both eukaryotic and prokaryotic proteomes. With the avalanche of protein sequences generated in the postgenomic age, it is a critical challenge to develop automated methods for accurate and rapidly identifying DNA-binding proteins based on their sequence information alone. Here, a novel predictor, called “iDNA-Prot|dis”, was established by incorporating the amino acid distance-pair coupling information and the amino acid reduced alphabet profile into the general pseudo amino acid composition (PseAAC) vector. The former can capture the characteristics of DNA-binding proteins so as to enhance its prediction quality, while the latter can reduce the dimension of PseAAC vector so as to speed up its prediction process. It was observed by the rigorous jackknife and independent dataset tests that the new predictor outperformed the existing predictors for the same purpose. As a user-friendly web-server, iDNA-Prot|dis is accessible to the public at http://bioinformatics.hitsz.edu.cn/iDNA-Prot_dis/. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step protocol guide is provided on how to use the web-server to get their desired results without the need to follow the complicated mathematic equations that are presented in this paper just for the integrity of its developing process. It is anticipated that the iDNA-Prot|dis predictor may become a useful high throughput tool for large-scale analysis of DNA-binding proteins, or at the very least, play a complementary role to the existing predictors in this regard.


Bioinformatics | 2015

repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects

Bin Liu; Fule Liu; Longyun Fang; Xiaolong Wang; Kuo-Chen Chou

UNLABELLEDnIn order to develop powerful computational predictors for identifying the biological features or attributes of DNAs, one of the most challenging problems is to find a suitable approach to effectively represent the DNA sequences. To facilitate the studies of DNAs and nucleotides, we developed a Python package called representations of DNAs (repDNA) for generating the widely used features reflecting the physicochemical properties and sequence-order effects of DNAs and nucleotides. There are three feature groups composed of 15 features. The first group calculates three nucleic acid composition features describing the local sequence information by means of kmers; the second group calculates six autocorrelation features describing the level of correlation between two oligonucleotides along a DNA sequence in terms of their specific physicochemical properties; the third group calculates six pseudo nucleotide composition features, which can be used to represent a DNA sequence with a discrete model or vector yet still keep considerable sequence-order information via the physicochemical properties of its constituent oligonucleotides. In addition, these features can be easily calculated based on both the built-in and user-defined properties via using repDNA.nnnAVAILABILITY AND IMPLEMENTATIONnThe repDNA Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repDNA/[email protected] or [email protected] INFORMATIONnSupplementary data are available at Bioinformatics online.


PLOS ONE | 2015

Identification of Real MicroRNA Precursors with a Pseudo Structure Status Composition Approach

Bin Liu; Longyun Fang; Fule Liu; Xiaolong Wang; Junjie Chen; Kuo-Chen Chou

Containing about 22 nucleotides, a micro RNA (abbreviated miRNA) is a small non-coding RNA molecule, functioning in transcriptional and post-transcriptional regulation of gene expression. The human genome may encode over 1000 miRNAs. Albeit poorly characterized, miRNAs are widely deemed as important regulators of biological processes. Aberrant expression of miRNAs has been observed in many cancers and other disease states, indicating they are deeply implicated with these diseases, particularly in carcinogenesis. Therefore, it is important for both basic research and miRNA-based therapy to discriminate the real pre-miRNAs from the false ones (such as hairpin sequences with similar stem-loops). Particularly, with the avalanche of RNA sequences generated in the postgenomic age, it is highly desired to develop computational sequence-based methods in this regard. Here two new predictors, called “iMcRNA-PseSSC” and “iMcRNA-ExPseSSC”, were proposed for identifying the human pre-microRNAs by incorporating the global or long-range structure-order information using a way quite similar to the pseudo amino acid composition approach. Rigorous cross-validations on a much larger and more stringent newly constructed benchmark dataset showed that the two new predictors (accessible at http://bioinformatics.hitsz.edu.cn/iMcRNA/) outperformed or were highly comparable with the best existing predictors in this area.


Journal of Theoretical Biology | 2015

Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy.

Bin Liu; Longyun Fang; Shanyi Wang; Xiaolong Wang; Hongtao Li; Kuo-Chen Chou

The microRNA (miRNA), a small non-coding RNA molecule, plays an important role in transcriptional and post-transcriptional regulation of gene expression. Its abnormal expression, however, has been observed in many cancers and other disease states, implying that the miRNA molecules are also deeply involved in these diseases, particularly in carcinogenesis. Therefore, it is important for both basic research and miRNA-based therapy to discriminate the real pre-miRNAs from the false ones (such as hairpin sequences with similar stem-loops). Most existing methods in this regard were based on the strategy in which RNA samples were formulated by a vector formed by their Kmer components. But the length of Kmers must be very short; otherwise, the vectors dimension would be extremely large, leading to the high-dimension disaster or overfitting problem. Inspired by the concept of degenerate energy levels in quantum mechanics, we introduced the degenerate Kmer (deKmer) to represent RNA samples. By doing so, not only we can accommodate long-range coupling effects but also we can avoid the high-dimension problem. Rigorous jackknife tests and cross-species experiments indicated that our approach is very promising. It has not escaped our notice that the deKmer approach can also be applied to many other areas of computational biology. A user-friendly web-server for the new predictor has been established at http://bioinformatics.hitsz.edu.cn/miRNA-deKmer/, by which users can easily get their desired results.


Journal of Biomolecular Structure & Dynamics | 2016

iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach

Bin Liu; Longyun Fang; Fule Liu; Xiaolong Wang; Kuo-Chen Chou

A microRNA (miRNA) is a small non-coding RNA molecule, functioning in transcriptional and post-transcriptional regulation of gene expression. The human genome may encode over 1000 miRNAs. Albeit poorly characterized, miRNAs are widely deemed as important regulators of biological processes. Aberrant expression of miRNAs has been observed in many cancers and other disease states, indicating that they are deeply implicated with these diseases, particularly in carcinogenesis. Therefore, it is important for both basic research and miRNA-based therapy to discriminate the real pre-miRNAs from the false ones (such as hairpin sequences with similar stem-loops). Particularly, with the avalanche of RNA sequences generated in the post-genomic age, it is highly desired to develop computational sequence-based methods for effectively identifying the human pre-miRNAs. Here, we propose a predictor called “iMiRNA-PseDPC”, in which the RNA sequences are formulated by a novel feature vector called “pseudo distance-pair composition” (PseDPC) with 10 types of structure statuses. Rigorous cross-validations on a much larger and more stringent newly constructed benchmark data-set showed that our approach has remarkably outperformed the existing ones in either prediction accuracy or efficiency, indicating the new predictor is quite promising or at least may become a complementary tool to the existing predictors in this area. For the convenience of most experimental scientists, a user-friendly web server for the new predictor has been established at http://bioinformatics.hitsz.edu.cn/iMiRNA-PseDPC/, by which users can easily get their desired results without the need to go through the mathematical details. It is anticipated that the new predictor may become a useful high throughput tool for genome analysis particularly in dealing with large-scale data.


Molecular Genetics and Genomics | 2016

repRNA: a web server for generating various feature vectors of RNA sequences

Bin Liu; Fule Liu; Longyun Fang; Xiaolong Wang; Kuo-Chen Chou

With the rapid growth of RNA sequences generated in the postgenomic age, it is highly desired to develop a flexible method that can generate various kinds of vectors to represent these sequences by focusing on their different features. This is because nearly all the existing machine-learning methods, such as SVM (support vector machine) and KNN (k-nearest neighbor), can only handle vectors but not sequences. To meet the increasing demands and speed up the genome analyses, we have developed a new web server, called “representations of RNA sequences” (repRNA). Compared with the existing methods, repRNA is much more comprehensive, flexible and powerful, as reflected by the following facts: (1) it can generate 11 different modes of feature vectors for users to choose according to their investigation purposes; (2) it allows users to select the features from 22 built-in physicochemical properties and even those defined by users’ own; (3) the resultant feature vectors and the secondary structures of the corresponding RNA sequences can be visualized. The repRNA web server is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repRNA/.


Molecular Informatics | 2015

PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.

Bin Liu; Jinghao Xu; Shixi Fan; Ruifeng Xu; Jiyun Zhou; Xiaolong Wang

Identification of DNA‐binding proteins is an important problem in biomedical research as DNA‐binding proteins are crucial for various cellular processes. Currently, the machine learning methods achieve the‐state‐of‐the‐art performance with different features. A key step to improve the performance of these methods is to find a suitable representation of proteins. In this study, we proposed a feature vector composed of three kinds of sequence‐based features, including overall amino acid composition, pseudo amino acid composition (PseAAC) proposed by Chou and physicochemical distance transformation. These features not only consider the sequence composition of proteins, but also incorporate the sequence‐order information of amino acids in proteins. The feature vectors were fed into Support Vector Machine (SVM) for DNA‐binding protein identification. The proposed method is called PseDNA‐Pro. Experiments on stringent benchmark datasets and independent test datasets by using the Jackknife test showed that PseDNA‐Pro can achieve an accuracy of higher than 80u2009%, outperforming several state‐of‐the‐art methods, including DNAbinder, DNA‐Prot, and iDNA‐Prot. These results indicate that the combination of various features for DNA‐binding protein prediction is a suitable approach, and the sequence‐order information among residues in proteins is relative for discrimination. For practical applications, a web‐server of PseDNA‐Pro was established, which is available from http://bioinformatics.hitsz.edu.cn/PseDNA‐Pro/.


Oncotarget | 2017

Pse-Analysis: a python package for DNA/RNA and protein/ peptide sequence analysis based on pseudo components and kernel methods

Bin Liu; Hao Wu; Deyuan Zhang; Xiaolong Wang; Kuo-Chen Chou

To expedite the pace in conducting genome/proteome analysis, we have developed a Python package called Pse-Analysis. The powerful package can automatically complete the following five procedures: (1) sample feature extraction, (2) optimal parameter selection, (3) model training, (4) cross validation, and (5) evaluating prediction quality. All the work a user needs to do is to input a benchmark dataset along with the query biological sequences concerned. Based on the benchmark dataset, Pse-Analysis will automatically construct an ideal predictor, followed by yielding the predicted results for the submitted query samples. All the aforementioned tedious jobs can be automatically done by the computer. Moreover, the multiprocessing technique was adopted to enhance computational speed by about 6 folds. The Pse-Analysis Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/Pse-Analysis/, and can be directly run on Windows, Linux, and Unix.

Collaboration


Dive into the Xiaolong Wang's collaboration.

Top Co-Authors

Avatar

Qingcai Chen

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Bin Liu

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Buzhou Tang

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Xuan Wang

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Yaoyun Zhang

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Yang Xiang

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Junjie Chen

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Kuo-Chen Chou

University of Electronic Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Chengjie Sun

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Fule Liu

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Researchain Logo
Decentralizing Knowledge