Yingfeng Wang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yingfeng Wang is active.

Explore More

Publication

Featured researches published by Yingfeng Wang.

Analytical Chemistry | 2014

MIDAS: A Database-Searching Algorithm for Metabolite Identification in Metabolomics

Yingfeng Wang; Guruprasad Kora; Benjamin P. Bowen; Chongle Pan

A database searching approach can be used for metabolite identification in metabolomics by matching measured tandem mass spectra (MS/MS) against the predicted fragments of metabolites in a database. Here, we present the open-source MIDAS algorithm (Metabolite Identification via Database Searching). To evaluate a metabolite-spectrum match (MSM), MIDAS first enumerates possible fragments from a metabolite by systematic bond dissociation, then calculates the plausibility of the fragments based on their fragmentation pathways, and finally scores the MSM to assess how well the experimental MS/MS spectrum from collision-induced dissociation (CID) is explained by the metabolites predicted CID MS/MS spectrum. MIDAS was designed to search high-resolution tandem mass spectra acquired on time-of-flight or Orbitrap mass spectrometer against a metabolite database in an automated and high-throughput manner. The accuracy of metabolite identification by MIDAS was benchmarked using four sets of standard tandem mass spectra from MassBank. On average, for 77% of original spectra and 84% of composite spectra, MIDAS correctly ranked the true compounds as the first MSMs out of all MetaCyc metabolites as decoys. MIDAS correctly identified 46% more original spectra and 59% more composite spectra at the first MSMs than an existing database-searching algorithm, MetFrag. MIDAS was showcased by searching a published real-world measurement of a metabolome from Synechococcus sp. PCC 7002 against the MetaCyc metabolite database. MIDAS identified many metabolites missed in the previous study. MIDAS identifications should be considered only as candidate metabolites, which need to be confirmed using standard compounds. To facilitate manual validation, MIDAS provides annotated spectra for MSMs and labels observed mass spectral peaks with predicted fragments. The database searching and manual validation can be performed online at http://midas.omicsbio.org.

Nature Communications | 2014

Diverse and divergent protein post-translational modifications in two growth stages of a natural microbial community

Zhou Li; Yingfeng Wang; Qiuming Yao; Nicholas B. Justice; Tae-Hyuk Ahn; Dong Xu; Robert L. Hettich; Jillian F. Banfield; Chongle Pan

Detailed characterization of post-translational modifications (PTMs) of proteins in microbial communities remains a significant challenge. Here we directly identify and quantify a broad range of PTMs (hydroxylation, methylation, citrullination, acetylation, phosphorylation, methylthiolation, S-nitrosylation and nitration) in a natural microbial community from an acid mine drainage site. Approximately 29% of the identified proteins of the dominant Leptospirillum group II bacteria are modified, and 43% of modified proteins carry multiple PTM types. Most PTM events, except S-nitrosylations, have low fractional occupancy. Notably, PTM events are detected on Cas proteins involved in antiviral defense, an aspect of Cas biochemistry not considered previously. Further, Cas PTM profiles from Leptospirillum group II differ in early versus mature biofilms. PTM patterns are divergent on orthologues of two closely related, but ecologically differentiated, Leptospirillum group II bacteria. Our results highlight the prevalence and dynamics of PTMs of proteins, with potential significance for ecological adaptation and microbial evolution.

Bioinformatics | 2013

Sipros/ProRata: a versatile informatics system for quantitative community proteomics

Yingfeng Wang; Tae-Hyuk Ahn; Zhou Li; Chongle Pan

SUMMARY Sipros/ProRata is an open-source software package for end-to-end data analysis in a wide variety of community proteomics measurements. A database-searching program, Sipros 3.0, was developed for accurate general-purpose protein identification and broad-range post-translational modification searches. Hybrid Message Passing Interface/OpenMP parallelism of the new Sipros architecture allowed its computation to be scalable from desktops to supercomputers. The upgraded ProRata 3.0 performs label-free quantification and isobaric chemical labeling quantification in addition to metabolic labeling quantification. Sipros/ProRata is a versatile informatics system that enables identification and quantification of proteins and their variants in many types of community proteomics studies. AVAILABILITY Both programs are freely available under the GNU GPL license at Sipros.omicsbio.org and ProRata.omicsbio.org.

Environmental Microbiology | 2014

(15)N- and (2)H proteomic stable isotope probing links nitrogen flow to archaeal heterotrophic activity.

Nicholas B. Justice; Zhou Li; Yingfeng Wang; Susan E. Spaudling; Annika C. Mosier; Robert L. Hettich; Chongle Pan; Jillian F. Banfield

Understanding how individual species contribute to nutrient transformations in a microbial community is critical to prediction of overall ecosystem function. We conducted microcosm experiments in which floating acid mine drainage (AMD) microbial biofilms were submerged - recapitulating the final stage in a natural biofilm life cycle. Biofilms were amended with either (15)NH4(+) or deuterium oxide ((2)H2O) and proteomic stable isotope probing (SIP) was used to track the extent to which different members of the community used these molecules in protein synthesis across anaerobic iron-reducing, aerobic iron-reducing and aerobic iron-oxidizing environments. Sulfobacillus spp. synthesized (15)N-enriched protein almost exclusively under iron-reducing conditions whereas the Leptospirillum spp. synthesized (15)N-enriched protein in all conditions. There were relatively few (15)N-enriched archaeal proteins, and all showed low atom% enrichment, consistent with Archaea synthesizing protein using the predominantly (14)N biomass derived from recycled biomolecules. In parallel experiments using (2)H2O, extensive archaeal protein synthesis was detected in all conditions. In contrast, the bacterial species showed little protein synthesis using (2)H2O. The nearly exclusive ability of Archaea to synthesize proteins using (2)H2O may be due to archaeal heterotrophy, whereby Archaea offset deleterious effects of (2)H by accessing (1)H generated by respiration of organic compounds.

BMC Bioinformatics | 2012

Stable stem enabled Shannon entropies distinguish non-coding RNAs from random backgrounds

Yingfeng Wang; Amir Manzour; Pooya Shareghi; Timothy I. Shaw; Ying-Wai Li; Russell L. Malmberg; Liming Cai

BackgroundThe computational identification of RNAs in genomic sequences requires the identification of signals of RNA sequences. Shannon base pairing entropy is an indicator for RNA secondary structure fold certainty in detection of structural, non-coding RNAs (ncRNAs). Under the Boltzmann ensemble of secondary structures, the probability of a base pair is estimated from its frequency across all the alternative equilibrium structures. However, such an entropy has yet to deliver the desired performance for distinguishing ncRNAs from random sequences. Developing novel methods to improve the entropy measure performance may result in more effective ncRNA gene finding based on structure detection.ResultsThis paper shows that the measuring performance of base pairing entropy can be significantly improved with a constrained secondary structure ensemble in which only canonical base pairs are assumed to occur in energetically stable stems in a fold. This constraint actually reduces the space of the secondary structure and may lower the probabilities of base pairs unfavorable to the native fold. Indeed, base pairing entropies computed with this constrained model demonstrate substantially narrowed gaps of Z-scores between ncRNAs, as well as drastic increases in the Z-score for all 13 tested ncRNA sets, compared to shuffled sequences.ConclusionsThese results suggest the viability of developing effective structure-based ncRNA gene finding methods by investigating secondary structure ensembles of ncRNAs.

Amino Acids | 2008

Discrimination of outer membrane proteins using a K-nearest neighbor method

Changhui Yan; Jing Hu; Yingfeng Wang

Summary.Identification of outer membrane proteins (OMPs) from genome is an important task. This paper presents a k-nearest neighbor (K-NN) method for discriminating outer membrane proteins (OMPs). The method makes predictions based on a weighted Euclidean distance that is computed from residue composition. The method achieves 89.1% accuracy with 0.668 MCC (Matthews correlation coefficient) in discriminating OMPs and non-OMPs. The performance of the method is improved by including homologous information into the calculation of residue composition. The final method achieves an accuracy of 96.1%, with 0.873 MCC, 87.5% sensitivity, and 98.2% specificity. Comparisons with multiple recently published methods show that the method proposed in this study outperforms the others.

BMC Bioinformatics | 2008

Discrimination of outer membrane proteins with improved performance

Changhui Yan; Jing Hu; Yingfeng Wang

BackgroundOuter membrane proteins (OMPs) perform diverse functional roles in Gram-negative bacteria. Identification of outer membrane proteins is an important task.ResultsThis paper presents a method for distinguishing outer membrane proteins (OMPs) from non-OMPs (that is, globular proteins and inner membrane proteins (IMPs)). First, we calculated the average residue compositions of OMPs, globular proteins and IMPs separately using a training set. Then for each protein from the test set, its distances to the three groups were calculated based on residue composition using a weighted Euclidean distance (WED) approach. Proteins from the test set were classified into OMP versus non-OMP classes based on the least distance. The proposed method can distinguish between OMPs and non-OMPs with 91.0% accuracy and 0.639 Matthews correlation coefficient (MCC). We then improved the method by including homologous sequences into the calculation of residue composition and using a feature-selection method to select the single residue and di-peptides that were useful for OMP prediction. The final method achieves an accuracy of 96.8% with 0.859 MCC. In direct comparisons, the proposed method outperforms previously published methods.ConclusionThe proposed method can identify OMPs with improved performance. It will be very helpful to the discovery of OMPs in a genome scale.

Neural Computing and Applications | 2009

A sensitivity-based approach for pruning architecture of Madalines

Xiaoqin Zeng; Jing Shao; Yingfeng Wang; Shuiming Zhong

Architecture design is a very important issue in neural network research. One popular way to find proper size of a network is to prune an oversize trained network to a smaller one while keeping established performance. This paper presents a sensitivity-based approach to prune hidden Adalines from a Madaline with causing as little as possible performance loss and thus easy compensating for the loss. The approach is novel in setting up a relevance measure, by means of an Adalines’ sensitivity measure, to locate the least relevant Adaline in a Madaline. The sensitivity measure is the probability of an Adaline’s output inversions due to input variation with respect to overall input patterns, and the relevance measure is defined as the multiplication of the Adaline’s sensitivity value by the summation of the absolute value of the Adaline’s outgoing weights. Based on the relevance measure, a pruning algorithm can be simply programmed, which iteratively prunes an Adaline with the least relevance value from hidden layer of a given Madaline and then conducts some compensations until no more Adalines can be removed under a given performance requirement. The effectiveness of the pruning approach is verified by some experimental results.

BMC Systems Biology | 2014

A graph kernel method for DNA-binding site prediction

Changhui Yan; Yingfeng Wang

BackgroundProtein-DNA interactions play important roles in many biological processes. Computational methods that can accurately predict DNA-binding sites on proteins will greatly expedite research on problems involving protein-DNA interactions.ResultsThis paper presents a method for predicting DNA-binding sites on protein structures. The method represents protein surface patches using labeled graphs and uses a graph kernel method to calculate the similarities between graphs. A new surface patch is predicted to be interface or non-interface patch based on its similarities to known DNA-binding patches and non-DNA-binding patches. The proposed method achieved high accuracy when tested on a representative set of 146 protein-DNA complexes using leave-one-out cross-validation. Then, the method was applied to identify DNA-binding sties on 13 unbound structures of DNA-binding proteins. In each of the unbound structure, the top 1 patch predicted by the proposed method precisely indicated the location of the DNA-binding site. Comparisons with other methods showed that the proposed method was competitive in predicting DNA-binding sites on unbound proteins.ConclusionsThe proposed method uses graphs to encode the features distribution in the 3-dimensional (3D) space. Thus, compared with other vector-based methods, it has the advantage of taking into account the spatial distribution of features on the proteins. Using an efficient kernel method to compare graphs the proposed method also avoids the demanding computations required for 3D objects comparison. It provides a competitive method for predicting DNA-binding sites without requiring structure alignment.

Journal of Theoretical Biology | 2013

Information-theoretic uncertainty of SCFG-modeled folding space of the non-coding RNA

Amirhossein Manzourolajdad; Yingfeng Wang; Timothy I. Shaw; Russell L. Malmberg

UNLABELLED RNA secondary structure ensembles define probability distributions for alternative equilibrium secondary structures of an RNA sequence. Shannons entropy is a measure for the amount of diversity present in any ensemble. In this work, Shannons entropy of the SCFG ensemble on an RNA sequence is derived and implemented in polynomial time for both structurally ambiguous and unambiguous grammars. Micro RNA sequences generally have low folding entropy, as previously discovered. Surprisingly, signs of significantly high folding entropy were observed in certain ncRNA families. More effective models coupled with targeted randomization tests can lead to a better insight into folding features of these families. AVAILABILITY URL http://www.plantbio.uga.edu/~russell/index.php?s=1&n=5&r=0.

Explore More