Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Wen-Feng Zeng is active.

Publication


Featured researches published by Wen-Feng Zeng.


Journal of Proteome Research | 2013

pNovo+: De Novo Peptide Sequencing Using Complementary HCD and ETD Tandem Mass Spectra

Hao Chi; Hai-Feng Chen; Kun He; Long Wu; Bing Yang; Rui-Xiang Sun; Jianyun Liu; Wen-Feng Zeng; Chun-Qing Song; Simin He; Meng-Qiu Dong

De novo peptide sequencing is the only tool for extracting peptide sequences directly from tandem mass spectrometry (MS) data without any protein database. However, neither the accuracy nor the efficiency of de novo sequencing has been satisfactory, mainly due to incomplete fragmentation information in experimental spectra. Recent advancement in MS technology has enabled acquisition of higher energy collisional dissociation (HCD) and electron transfer dissociation (ETD) spectra of the same precursor. These spectra contain complementary fragmentation information and can be collected with high resolution and high mass accuracy. Taking these advantages, we have developed a new algorithm called pNovo+, which greatly improves the accuracy and speed of de novo sequencing. On tryptic peptides, 86% of the topmost candidate sequences deduced by pNovo+ from HCD + ETD spectral pairs matched the database search results, and the success rate reached 95% if the top three candidates were included, which was much higher than using only HCD (87%) or only ETD spectra (57%). On Asp-N, Glu-C, or Elastase digested peptides, 69-87% of the HCD + ETD spectral pairs were correctly identified by pNovo+ among the topmost candidates, or 84-95% among the top three. On average, it takes pNovo+ only 0.018 s to extract the sequence from a spectrum or spectral pair on a common personal computer. This is more than three times as fast as other de novo sequencing programs. The increase of speed is mainly due to pDAG, a component algorithm of pNovo+. pDAG finds the k longest paths in a directed acyclic graph without the antisymmetry restriction. We have verified that the antisymmetry restriction is unnecessary for high resolution, high mass accuracy data. The extensive use of HCD and ETD spectral information and the pDAG algorithm make pNovo+ an excellent de novo sequencing tool.


Scientific Reports | 2016

pGlyco: a pipeline for the identification of intact N-glycopeptides by using HCD- and CID-MS/MS and MS3

Wen-Feng Zeng; Mingqi Liu; Yang Zhang; Jian-Qiang Wu; Pan Fang; Chao Peng; Aiying Nie; Guoquan Yan; Weiqian Cao; Chao Liu; Hao Chi; Rui-Xiang Sun; Catherine C. L. Wong; Simin He; Pengyuan Yang

Confident characterization of the microheterogeneity of protein glycosylation through identification of intact glycopeptides remains one of the toughest analytical challenges for glycoproteomics. Recently proposed mass spectrometry (MS)-based methods still have some defects such as lack of the false discovery rate (FDR) analysis for the glycan identification and lack of sufficient fragmentation information for the peptide identification. Here we proposed pGlyco, a novel pipeline for the identification of intact glycopeptides by using complementary MS techniques: 1) HCD-MS/MS followed by product-dependent CID-MS/MS was used to provide complementary fragments to identify the glycans, and a novel target-decoy method was developed to estimate the false discovery rate of the glycan identification; 2) data-dependent acquisition of MS3 for some most intense peaks of HCD-MS/MS was used to provide fragments to identify the peptide backbones. By integrating HCD-MS/MS, CID-MS/MS and MS3, intact glycopeptides could be confidently identified. With pGlyco, a standard glycoprotein mixture was analyzed in the Orbitrap Fusion, and 309 non-redundant intact glycopeptides were identified with detailed spectral information of both glycans and peptides.


Analytical Chemistry | 2016

pTop 1.0: A High-Accuracy and High-Efficiency Search Engine for Intact Protein Identification

Rui-Xiang Sun; Lan Luo; Long Wu; Rui-Min Wang; Wen-Feng Zeng; Hao Chi; Chao Liu; Simin He

There has been tremendous progress in top-down proteomics (TDP) in the past 5 years, particularly in intact protein separation and high-resolution mass spectrometry. However, bioinformatics to deal with large-scale mass spectra has lagged behind, in both algorithmic research and software development. In this study, we developed pTop 1.0, a novel software tool to significantly improve the accuracy and efficiency of mass spectral data analysis in TDP. The precursor mass offers crucial clues to infer the potential post-translational modifications co-occurring on the protein, the reliability of which relies heavily on its mass accuracy. Concentrating on detecting the precursors more accurately, a machine-learning model incorporating a variety of spectral features was trained online in pTop via a support vector machine (SVM). pTop employs the sequence tags extracted from the MS/MS spectra and a dynamic programming algorithm to accelerate the search speed, especially for those spectra with multiple post-translational modifications. We tested pTop on three publicly available data sets and compared it with ProSight and MS-Align+ in terms of its recall, precision, running time, and so on. The results showed that pTop can, in general, outperform ProSight and MS-Align+. pTop recalled 22% more correct precursors, although it exported 30% fewer precursors than Xtract (in ProSight) from a human histone data set. The running speed of pTop was about 1 to 2 orders of magnitude faster than that of MS-Align+. This algorithmic advancement in pTop, including both accuracy and speed, will inspire the development of other similar software to analyze the mass spectra from the entire proteins.


Bioinformatics | 2015

A note on the false discovery rate of novel peptides in proteogenomics.

Kun Zhang; Yan Fu; Wen-Feng Zeng; Kun He; Hao Chi; Chao Liu; Yanchang Li; Yuan Gao; Ping Xu; Simin He

Motivation: Proteogenomics has been well accepted as a tool to discover novel genes. In most conventional proteogenomic studies, a global false discovery rate is used to filter out false positives for identifying credible novel peptides. However, it has been found that the actual level of false positives in novel peptides is often out of control and behaves differently for different genomes. Results: To quantitatively model this problem, we theoretically analyze the subgroup false discovery rates of annotated and novel peptides. Our analysis shows that the annotation completeness ratio of a genome is the dominant factor influencing the subgroup FDR of novel peptides. Experimental results on two real datasets of Escherichia coli and Mycobacterium tuberculosis support our conjecture. Contact: [email protected] or [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Nature Communications | 2017

pGlyco 2.0 enables precision N-glycoproteomics with comprehensive quality control and one-step mass spectrometry for intact glycopeptide identification

Mingqi Liu; Wen-Feng Zeng; Pan Fang; Weiqian Cao; Chao Liu; Guoquan Yan; Yang Zhang; Chao Peng; Jian-Qiang Wu; Xiao-Jin Zhang; Hui-Jun Tu; Hao Chi; Rui-Xiang Sun; Yong Cao; Meng-Qiu Dong; Biyun Jiang; Jiangming Huang; Huali Shen; Catherine C. L. Wong; Simin He; Pengyuan Yang

The precise and large-scale identification of intact glycopeptides is a critical step in glycoproteomics. Owing to the complexity of glycosylation, the current overall throughput, data quality and accessibility of intact glycopeptide identification lack behind those in routine proteomic analyses. Here, we propose a workflow for the precise high-throughput identification of intact N-glycopeptides at the proteome scale using stepped-energy fragmentation and a dedicated search engine. pGlyco 2.0 conducts comprehensive quality control including false discovery rate evaluation at all three levels of matches to glycans, peptides and glycopeptides, improving the current level of accuracy of intact glycopeptide identification. The N-glycoproteome of samples metabolically labeled with 15N/13C were analyzed quantitatively and utilized to validate the glycopeptide identification, which could be used as a novel benchmark pipeline to compare different search engines. Finally, we report a large-scale glycoproteome dataset consisting of 10,009 distinct site-specific N-glycans on 1988 glycosylation sites from 955 glycoproteins in five mouse tissues.Protein glycosylation is a heterogeneous post-translational modification that generates greater proteomic diversity that is difficult to analyze. Here the authors describe pGlyco 2.0, a workflow for the precise one step identification of intact N-glycopeptides at the proteome scale.


Journal of Proteome Research | 2017

Open-pNovo: De Novo Peptide Sequencing with Thousands of Protein Modifications

Hao Yang; Hao Chi; Wen-Jing Zhou; Wen-Feng Zeng; Kun He; Chao Liu; Rui-Xiang Sun; Simin He

De novo peptide sequencing has improved remarkably, but sequencing full-length peptides with unexpected modifications is still a challenging problem. Here we present an open de novo sequencing tool, Open-pNovo, for de novo sequencing of peptides with arbitrary types of modifications. Although the search space increases by ∼300 times, Open-pNovo is close to or even ∼10-times faster than the other three proposed algorithms. Furthermore, considering top-1 candidates on three MS/MS data sets, Open-pNovo can recall over 90% of the results obtained by any one traditional algorithm and report 5-87% more peptides, including 14-250% more modified peptides. On a high-quality simulated data set, ∼85% peptides with arbitrary modifications can be recalled by Open-pNovo, while hardly any results can be recalled by others. In summary, Open-pNovo is an excellent tool for open de novo sequencing and has great potential for discovering unexpected modifications in the real biological applications.


Oncotarget | 2016

In-depth mapping of the mouse brain N-glycoproteome reveals widespread N-glycosylation of diverse brain proteins

Pan Fang; Xin-jian Wang; Yu Xue; Mingqi Liu; Wen-Feng Zeng; Yang Zhang; Lei Zhang; Xing Gao; Guoquan Yan; Jun Yao; Huali Shen; Pengyuan Yang

N-glycosylation is one of the most prominent and abundant posttranslational modifications of proteins. It is estimated that over 50% of mammalian proteins undergo glycosylation. However, the analysis of N-glycoproteins has been limited by the available analytical technology. In this study, we comprehensively mapped the N-glycosylation sites in the mouse brain proteome by combining complementary methods, which included seven protease treatments, four enrichment techniques and two fractionation strategies. Altogether, 13492 N-glycopeptides containing 8386 N-glycosylation sites on 3982 proteins were identified. After evaluating the performance of the above methods, we proposed a simple and efficient workflow for large-scale N-glycosylation site mapping. The optimized workflow yielded 80% of the initially identified N-glycosylation sites with considerably less effort. Analysis of the identified N-glycoproteins revealed that many of the mouse brain proteins are N-glycosylated, including those proteins in critical pathways for nervous system development and neurological disease. Additionally, several important biomarkers of various diseases were found to be N-glycosylated. These data confirm that N-glycosylation is important in both physiological and pathological processes in the brain, and provide useful details about numerous N-glycosylation sites in brain proteins.


Analytical Chemistry | 2017

pDeep: Predicting MS/MS Spectra of Peptides with Deep Learning

Xie-Xuan Zhou; Wen-Feng Zeng; Hao Chi; Chunjie Luo; Chao Liu; Jianfeng Zhan; Simin He; Zhifei Zhang

In tandem mass spectrometry (MS/MS)-based proteomics, search engines rely on comparison between an experimental MS/MS spectrum and the theoretical spectra of the candidate peptides. Hence, accurate prediction of the theoretical spectra of peptides appears to be particularly important. Here, we present pDeep, a deep neural network-based model for the spectrum prediction of peptides. Using the bidirectional long short-term memory (BiLSTM), pDeep can predict higher-energy collisional dissociation, electron-transfer dissociation, and electron-transfer and higher-energy collision dissociation MS/MS spectra of peptides with >0.9 median Pearson correlation coefficients. Further, we showed that intermediate layer of the neural network could reveal physicochemical properties of amino acids, for example the similarities of fragmentation behaviors between amino acids. We also showed the potential of pDeep to distinguish extremely similar peptides (peptides that contain isobaric amino acids, for example, GG = N, AG = Q, or even I = L), which were very difficult to distinguish using traditional search engines.


bioRxiv | 2018

Open-pFind enables precise, comprehensive and rapid peptide identification in shotgun proteomics

Hao Chi; Chao Liu; Hao Yang; Wen-Feng Zeng; Long Wu; Wen-Jing Zhou; Xiu-Nan Niu; Yue-He Ding; Yao Zhang; Rui-Min Wang; Zhao-Wei Wang; Zhen-Lin Chen; Rui-Xiang Sun; Tao Liu; Guang-Ming Tan; Meng-Qiu Dong; Ping Xu; Pei-Heng Zhang; Simin He

Shotgun proteomics has grown rapidly in recent decades, but a large fraction of tandem mass spectrometry (MS/MS) data in shotgun proteomics are not successfully identified. We have developed a novel database search algorithm, Open-pFind, to efficiently identify peptides even in an ultra-large search space which takes into account unexpected modifications, amino acid mutations, semi- or non-specific digestion and co-eluting peptides. Tested on two metabolically labeled MS/MS datasets, Open-pFind reported 50.5‒117.0% more peptide-spectrum matches (PSMs) than the seven other advanced algorithms. More importantly, the Open-pFind results were more credible judged by the verification experiments using stable isotopic labeling. Tested on four additional large-scale datasets, 70‒85% of the spectra were confidently identified, and high-quality spectra were nearly completely interpreted by Open-pFind. Further, Open-pFind was over 40 times faster than the other three open search algorithms and 2‒3 times faster than three restricted search algorithms. Re-analysis of an entire human proteome dataset consisting of ∼25 million spectra using Open-pFind identified a total of 14,064 proteins encoded by 12,723 genes by requiring at least two uniquely identified peptides. In this search results, Open-pFind also excelled in an independent test for false positives based on the presence or absence of olfactory receptors. Thus, a practical use of the open search strategy has been realized by Open-pFind for the truly global-scale proteomics experiments of today and in the future.


Journal of Proteome Research | 2018

pSite: Amino Acid Confidence Evaluation for Quality Control of De Novo Peptide Sequencing and Modification Site Localization

Hao Yang; Hao Chi; Wen-Jing Zhou; Wen-Feng Zeng; Chao Liu; Rui-Min Wang; Zhao-Wei Wang; Xiu-Nan Niu; Zhen-Lin Chen; Simin He

MS-based de novo peptide sequencing has been improved remarkably with significant development of mass-spectrometry and computational approaches but still lacks quality-control methods. Here we proposed a novel algorithm pSite to evaluate the confidence of each amino acid rather than the full-length peptides obtained by de novo peptide sequencing. A semi-supervised learning approach was used to discriminate correct amino acids from random one; then, an expectation-maximization algorithm was used to adaptively control the false amino-acid rate (FAR). On three test data sets, pSite recalled 86% more amino acids on average than PEAKS at the FAR of 5%. pSite also performed superiorly on the modification site localization problem, which is essentially a special case of amino acid confidence evaluation. On three phosphopeptide data sets, at the false localization rate of 1%, the average recall of pSite was 91% while those of Ascore and phosphoRS were 64 and 63%, respectively. pSite covered 98% of Ascore and phosphoRS results and contributed 21% more phosphorylation sites. Further analyses show that the use of distinct fragmentation features in high-resolution MS/MS spectra, such as neutral loss ions, played an important role in improving the precision of pSite. In summary, the effective and universal model together with the extensive use of spectral information makes pSite an excellent quality control tool for both de novo peptide sequencing and modification site localization.

Collaboration


Dive into the Wen-Feng Zeng's collaboration.

Top Co-Authors

Avatar

Hao Chi

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Chao Liu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Simin He

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Rui-Xiang Sun

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Meng-Qiu Dong

Scripps Research Institute

View shared research outputs
Top Co-Authors

Avatar

Hao Yang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Kun He

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Kun Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Long Wu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Rui-Min Wang

Chinese Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge