Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Liangjiang Wang is active.

Publication


Featured researches published by Liangjiang Wang.


Nucleic Acids Research | 2006

BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences

Liangjiang Wang; Susan J. Brown

BindN () takes an amino acid sequence as input and predicts potential DNA or RNA-binding residues with support vector machines (SVMs). Protein datasets with known DNA or RNA-binding residues were selected from the Protein Data Bank (PDB), and SVM models were constructed using data instances encoded with three sequence features, including the side chain pKa value, hydrophobicity index and molecular mass of an amino acid. The results suggest that DNA-binding residues can be predicted at 69.40% sensitivity and 70.47% specificity, while prediction of RNA-binding residues achieves 66.28% sensitivity and 69.84% specificity. When compared with previous studies, the SVM models appear to be more accurate and more efficient for online predictions. BindN provides a useful tool for understanding the function of DNA and RNA-binding proteins based on primary sequence data.


BMC Systems Biology | 2010

BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features

Liangjiang Wang; Caiyan Huang; Mary Qu Yang; Jack Y. Yang

BackgroundUnderstanding how biomolecules interact is a major task of systems biology. To model protein-nucleic acid interactions, it is important to identify the DNA or RNA-binding residues in proteins. Protein sequence features, including the biochemical property of amino acids and evolutionary information in terms of position-specific scoring matrix (PSSM), have been used for DNA or RNA-binding site prediction. However, PSSM is rather designed for PSI-BLAST searches, and it may not contain all the evolutionary information for modelling DNA or RNA-binding sites in protein sequences.ResultsIn the present study, several new descriptors of evolutionary information have been developed and evaluated for sequence-based prediction of DNA and RNA-binding residues using support vector machines (SVMs). The new descriptors were shown to improve classifier performance. Interestingly, the best classifiers were obtained by combining the new descriptors and PSSM, suggesting that they captured different aspects of evolutionary information for DNA and RNA-binding site prediction. The SVM classifiers achieved 77.3% sensitivity and 79.3% specificity for prediction of DNA-binding residues, and 71.6% sensitivity and 78.7% specificity for RNA-binding site prediction.ConclusionsPredictions at this level of accuracy may provide useful information for modelling protein-nucleic acid interactions in systems biology studies. We have thus developed a web-based tool called BindN+ (http://bioinfo.ggc.org/bindn+/) to make the SVM classifiers accessible to the research community.


BMC Genomics | 2009

Prediction of DNA-binding residues from protein sequence information using random forests

Liangjiang Wang; Mary Qu Yang; Jack Y. Yang

BackgroundProtein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for only a few hundreds of protein-DNA complexes. With the rapid accumulation of sequence data, it becomes an important but challenging task to accurately predict DNA-binding residues directly from amino acid sequence data.ResultsA new machine learning approach has been developed in this study for predicting DNA-binding residues from amino acid sequence data. The approach used both the labelled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices (PSSMs) and several new descriptors. The sequence-derived features were then used to train random forests (RFs), which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset, and the predicted DNA-binding residues were examined in the context of three-dimensional structures.ConclusionThe results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies. A new web server called BindN-RF http://bioinfo.ggc.org/bindn-rf/ has thus been developed to make the RF classifier accessible to the biological research community.


BMC Genomics | 2010

Sequence feature-based prediction of protein stability changes upon amino acid substitutions

Shaolei Teng; Anand K. Srivastava; Liangjiang Wang

BackgroundProtein destabilization is a common mechanism by which amino acid substitutions cause human diseases. Although several machine learning methods have been reported for predicting protein stability changes upon amino acid substitutions, the previous studies did not utilize relevant sequence features representing biological knowledge for classifier construction.ResultsIn this study, a new machine learning method has been developed for sequence feature-based prediction of protein stability changes upon amino acid substitutions. Support vector machines were trained with data from experimental studies on the free energy change of protein stability upon mutations. To construct accurate classifiers, twenty sequence features were examined for input vector encoding. It was shown that classifier performance varied significantly by using different sequence features. The most accurate classifier in this study was constructed using a combination of six sequence features. This classifier achieved an overall accuracy of 84.59% with 70.29% sensitivity and 90.98% specificity.ConclusionsRelevant sequence features can be used to accurately predict protein stability changes upon amino acid substitutions. Predictive results at this level of accuracy may provide useful information to distinguish between deleterious and tolerant alterations in disease candidate genes. To make the classifier accessible to the genetics research community, we have developed a new web server, called MuStab (http://bioinfo.ggc.org/mustab/).


Human Mutation | 2010

Computational analysis of missense mutations causing Snyder-Robinson Syndrome

Zhe Zhang; Shaolei Teng; Liangjiang Wang; Emil Alexov

The Snyder‐Robinson syndrome is caused by missense mutations in the spermine sythase gene that encodes a protein (SMS) of 529 amino acids. Here we investigate, in silico, the molecular effect of three missense mutations, c.267G>A (p.G56S), c.496T>G (p.V132G), and c.550T>C (p.I150T) in SMS that were clinically identified to cause the disease. Single‐point energy calculations, molecular dynamics simulations, and pKa calculations revealed the effects of these mutations on SMSs stability, flexibility, and interactions. It was predicted that the catalytic residue, Asp276, should be protonated prior binding the substrates. The pKa calculations indicated the p.I150T mutation causes pKa changes with respect to the wild‐type SMS, which involve titratable residues interacting with the S‐methyl‐5′‐thioadenosine (MTA) substrate. The p.I150T missense mutation was also found to decrease the stability of the C‐terminal domain and to induce structural changes in the vicinity of the MTA binding site. The other two missense mutations, p.G56S and p.V132G, are away from active site and do not perturb its wild‐type properties, but affect the stability of both the monomers and the dimer. Specifically, the p.G56S mutation is predicted to greatly reduce the affinity of monomers to form a dimer, and therefore should have a dramatic effect on SMS function because dimerization is essential for SMS activity. Hum Mutat 31:1043–1049, 2010.


Nucleic Acids Research | 2007

BeetleBase: the model organism database for Tribolium castaneum

Liangjiang Wang; Suzhi Wang; Yonghua Li; Martin S. R. Paradesi; Susan J. Brown

BeetleBase (http://www.bioinformatics.ksu.edu/BeetleBase/) is an integrated resource for the Tribolium research community. The red flour beetle (Tribolium castaneum) is an important model organism for genetics, developmental biology, toxicology and comparative genomics, the genome of which has recently been sequenced. BeetleBase is constructed to integrate the genomic sequence data with information about genes, mutants, genetic markers, expressed sequence tags and publications. BeetleBase uses the Chado data model and software components developed by the Generic Model Organism Database (GMOD) project. This strategy not only reduces the time required to develop the database query tools but also makes the data structure of BeetleBase compatible with that of other model organism databases. BeetleBase will be useful to the Tribolium research community for genome annotation as well as comparative genomics.


Plant Molecular Biology | 2006

Comparative analysis of expressed sequences reveals a conserved pattern of optimal codon usage in plants

Liangjiang Wang; Marilyn J. Roossinck

Codon usage bias is a ubiquitous phenomenon, which may be caused by mutational bias, selection, or both. The patterns of codon usage in plants are not well understood. Datasets of expressed sequence tags (ESTs) available for many plant species provide the resources for large-scale comparative analysis of codon usage patterns. We developed a computational approach to translate EST or assembled contig sequences, and then used the coding information for comparative analysis of codon usage in 12 plant species, including 6 eudicots, 5 monocots and the green alga Chlamydomonas reinhardtii. While codon nucleotide composition is highly conserved within eudicots or monocots, there is a significant difference between these two major taxonomic groups of higher plants. The third nucleotide position of codons is AU-rich in the eudicot genomes (35–42% of G+C content), but GC-rich in the monocot genomes (59–61% of G+C content). To identify optimal codons in these species, we used EST counts to estimate gene transcript levels. It was demonstrated that codon usage bias is correlated positively with gene transcript levels. Interestingly, the use of optimal codons appears to be well conserved between eudicots and monocots, and to a lesser degree between the higher plants and C.reinhardtii. Most of the optimal codons end with a C or G base, regardless of the different nucleotide composition in these genomes. The results suggest that plant codon usage is affected by translational selection, and the selective pressure appears to be conserved in the plant kingdom.


Insect Biochemistry and Molecular Biology | 2008

Analysis of transcriptome data in the red flour beetle, Tribolium castaneum

Yoonseong Park; Jamie Aikins; Liangjiang Wang; Richard W. Beeman; Brenda Oppert; Jeffrey C. Lord; Susan J. Brown; Marcé D. Lorenzen; Stephen Richards; George M. Weinstock; Richard A. Gibbs

The whole genome sequence of Tribolium castaneum, a worldwide coleopteran pest of stored products, has recently been determined. In order to facilitate accurate annotation and detailed functional analysis of this genome, we have compiled and analyzed all available expressed sequence tag (EST) data. The raw data consist of 61,228 ESTs, including 10,704 obtained from NCBI and an additional 50,524 derived from 32,544 clones generated in our laboratories. These sequences were amassed from cDNA libraries representing six different tissues or stages, namely: whole embryos, whole larvae, larval hindguts and Malpighian tubules, larval fat bodies and carcasses, adult ovaries, and adult heads. Assembly of the 61,228 sequences collapsed into 12,269 clusters (groups of overlapping ESTs representing single genes), of which 10,134 mapped onto 6,463 (39%) of the 16,422 GLEAN gene models (i.e. official Tribolium gene list). Approximately 1,600 clusters (13% of the total) lack corresponding GLEAN models, despite high matches to the genome, suggesting that a considerable number of transcribed sequences were missed by the gene prediction programs or were removed by GLEAN. We conservatively estimate that the current EST set represents more than 7,500 transcription units.


Amino Acids | 2012

Predicting protein sumoylation sites from sequence features.

Shaolei Teng; Hong Luo; Liangjiang Wang

Protein sumoylation is a post-translational modification that plays an important role in a wide range of cellular processes. Small ubiquitin-related modifier (SUMO) can be covalently and reversibly conjugated to the sumoylation sites of target proteins, many of which are implicated in various human genetic disorders. The accurate prediction of protein sumoylation sites may help biomedical researchers to design their experiments and understand the molecular mechanism of protein sumoylation. In this study, a new machine learning approach has been developed for predicting sumoylation sites from protein sequence information. Random forests (RFs) and support vector machines (SVMs) were trained with the data collected from the literature. Domain-specific knowledge in terms of relevant biological features was used for input vector encoding. It was shown that RF classifier performance was affected by the sequence context of sumoylation sites, and 20 residues with the core motif ΨKXE in the middle appeared to provide enough context information for sumoylation site prediction. The RF classifiers were also found to outperform SVM models for predicting protein sumoylation sites from sequence features. The results suggest that the machine learning approach gives rise to more accurate prediction of protein sumoylation sites than the other existing methods. The accurate classifiers have been used to develop a new web server, called seeSUMO (http://bioinfo.ggc.org/seesumo/), for sequence-based prediction of protein sumoylation sites.


Lipids | 2011

LipidomeDB Data Calculation Environment: Online Processing of Direct-Infusion Mass Spectral Data for Lipid Profiles

Zhenguo Zhou; Shantan R. Marepally; Daya Sagar Nune; Prashanth Pallakollu; Gail Ragan; Mary R. Roth; Liangjiang Wang; Gerald H. Lushington; Mahesh Visvanathan; Ruth Welti

LipidomeDB Data Calculation Environment (DCE) is a web application to quantify complex lipids by processing data acquired after direct infusion of a lipid-containing biological extract, to which a cocktail of internal standards has been added, into an electrospray source of a triple quadrupole mass spectrometer. LipidomeDB DCE is located on the public Internet at http://lipidome.bcf.ku.edu:9000/Lipidomics. LipidomeDB DCE supports targeted analyses; analyte information can be entered, or pre-formulated lists of typical plant or animal polar lipid analytes can be selected. LipidomeDB DCE performs isotopic deconvolution and quantification in comparison to internal standard spectral peaks. Multiple precursor or neutral loss spectra from up to 35 samples may be processed simultaneously with data input as Excel files and output as tables viewable on the web and exportable in Excel. The pre-formulated compound lists and web access, used with direct-infusion mass spectrometry, provide a simple approach to lipidomic analysis, particularly for new users.

Collaboration


Dive into the Liangjiang Wang's collaboration.

Top Co-Authors

Avatar

Jack Y. Yang

University of Texas at San Antonio

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wen-Qiao Tang

Shanghai Ocean University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Guoli Zhu

Shanghai Ocean University

View shared research outputs
Researchain Logo
Decentralizing Knowledge