Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Le-Heng Wang is active.

Publication


Featured researches published by Le-Heng Wang.


Nature Methods | 2012

Identification of cross-linked peptides from complex samples

Bing Yang; Yanjie Wu; Ming Zhu; Sheng-Bo Fan; Jinzhong Lin; Kun Zhang; Shuang Li; Hao Chi; Yu-Xin Li; Hai-Feng Chen; Shukun Luo; Yue-He Ding; Le-Heng Wang; Zhiqi Hao; Li-Yun Xiu; She Chen; Keqiong Ye; Simin He; Meng-Qiu Dong

We have developed pLink, software for data analysis of cross-linked proteins coupled with mass-spectrometry analysis. pLink reliably estimates false discovery rate in cross-link identification and is compatible with multiple homo- or hetero-bifunctional cross-linkers. We validated the program with proteins of known structures, and we further tested it on protein complexes, crude immunoprecipitates and whole-cell lysates. We show that it is a robust tool for protein-structure and protein-protein–interaction studies.


Journal of Proteome Research | 2010

pNovo: De novo Peptide Sequencing and Identification Using HCD Spectra

Hao Chi; Rui-Xiang Sun; Bing Yang; Chun-Qing Song; Le-Heng Wang; Chao Liu; Yan Fu; Zuo-Fei Yuan; Haipeng Wang; Simin He; Meng-Qiu Dong

De novo peptide sequencing has improved remarkably in the past decade as a result of better instruments and computational algorithms. However, de novo sequencing can correctly interpret only approximately 30% of high- and medium-quality spectra generated by collision-induced dissociation (CID), which is much less than database search. This is mainly due to incomplete fragmentation and overlap of different ion series in CID spectra. In this study, we show that higher-energy collisional dissociation (HCD) is of great help to de novo sequencing because it produces high mass accuracy tandem mass spectrometry (MS/MS) spectra without the low-mass cutoff associated with CID in ion trap instruments. Besides, abundant internal and immonium ions in the HCD spectra can help differentiate similar peptide sequences. Taking advantage of these characteristics, we developed an algorithm called pNovo for efficient de novo sequencing of peptides from HCD spectra. pNovo gave correct identifications to 80% or more of the HCD spectra identified by database search. The number of correct full-length peptides sequenced by pNovo is comparable with that obtained by database search. A distinct advantage of de novo sequencing is that deamidated peptides and peptides with amino acid mutations can be identified efficiently without extra cost in computation. In summary, implementation of the HCD characteristics makes pNovo an excellent tool for de novo peptide sequencing from HCD spectra.


Molecular & Cellular Proteomics | 2009

A Strategy for Precise and Large Scale Identification of Core Fucosylated Glycoproteins

Wei Jia; Zhuang Lu; Yan Fu; Haipeng Wang; Le-Heng Wang; Hao Chi; Zuo-Fei Yuan; Zhaobin Zheng; Lina Song; Huanhuan Han; YiMin Liang; Jinglan Wang; Yun Cai; Yukui Zhang; Yulin Deng; Wantao Ying; Simin He; Xiaohong Qian

Core fucosylation (CF) patterns of some glycoproteins are more sensitive and specific than evaluation of their total respective protein levels for diagnosis of many diseases, such as cancers. Global profiling and quantitative characterization of CF glycoproteins may reveal potent biomarkers for clinical applications. However, current techniques are unable to reveal CF glycoproteins precisely on a large scale. Here we developed a robust strategy that integrates molecular weight cutoff, neutral loss-dependent MS3, database-independent candidate spectrum filtering, and optimization to effectively identify CF glycoproteins. The rationale for spectrum treatment was innovatively based on computation of the mass distribution in spectra of CF glycopeptides. The efficacy of this strategy was demonstrated by implementation for plasma from healthy subjects and subjects with hepatocellular carcinoma. Over 100 CF glycoproteins and CF sites were identified, and over 10,000 mass spectra of CF glycopeptide were found. The scale of identification results indicates great progress for finding biomarkers with a particular and attractive prospect, and the candidate spectra will be a useful resource for the improvement of database searching methods for glycopeptides.


Journal of Proteome Research | 2010

Improved Peptide Identification for Proteomic Analysis Based on Comprehensive Characterization of Electron Transfer Dissociation Spectra

Rui-Xiang Sun; Meng-Qiu Dong; Chun-Qing Song; Hao Chi; Bing Yang; Li-Yun Xiu; Li Tao; Zhi-Yi Jing; Chao Liu; Le-Heng Wang; Yan Fu; Simin He

In recent years, electron transfer dissociation (ETD) has enjoyed widespread applications from sequencing of peptides with or without post-translational modifications to top-down analysis of intact proteins. However, peptide identification rates from ETD spectra compare poorly with those from collision induced dissociation (CID) spectra, especially for doubly charged precursors. This is in part due to an insufficient understanding of the characteristics of ETD and consequently a failure of database search engines to make use of the rich information contained in the ETD spectra. In this study, we statistically characterized ETD fragmentation patterns from a collection of 461 440 spectra and subsequently implemented our findings into pFind, a database search engine developed earlier for CID data. From ETD spectra of doubly charged precursors, pFind 2.1 identified 63-122% more unique peptides than Mascot 2.2 under the same 1% false discovery rate. For higher charged peptides as well as phosphopeptides, pFind 2.1 also consistently obtained more identifications. Of the features built into pFind 2.1, the following two greatly enhanced its performance: (1) refined automatic detection and removal of high-intensity peaks belonging to the precursor, charge-reduced precursor, or related neutral loss species, whose presence often set spectral matching askew; (2) a thorough consideration of hydrogen-rearranged fragment ions such as z + H and c - H for peptide precursors of different charge states. Our study has revealed that different charge states of precursors result in different hydrogen rearrangement patterns. For a fragment ion, its propensity of gaining or losing a hydrogen depends on (1) the ion type (c or z) and (2) the size of the fragment relative to the precursor, and both dependencies are affected by (3) the charge state of the precursor. In addition, we discovered ETD characteristics that are unique for certain types of amino acids (AAs), such as a prominent neutral loss of SCH(2)CONH(2) (90.0014 Da) from z ions with a carbamidomethylated cysteine at the N-terminus and a neutral loss of histidine side chain C(4)N(2)H(5) (81.0453 Da) from precursor ions containing histidine. The comprehensive list of ETD characteristics summarized in this paper should be valuable for automated database search, de novo peptide sequencing, and manual spectral validation.


Rapid Communications in Mass Spectrometry | 2010

Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing

You Li; Hao Chi; Le-Heng Wang; Haipeng Wang; Yan Fu; Zuo-Fei Yuan; Su-Jun Li; Yan-Sheng Liu; Rui-Xiang Sun; Rong Zeng; Simin He

Database searching is the technique of choice for shotgun proteomics, and to date much research effort has been spent on improving its effectiveness. However, database searching faces a serious challenge of efficiency, considering the large numbers of mass spectra and the ever fast increase in peptide databases resulting from genome translations, enzymatic digestions, and post-translational modifications. In this study, we conducted systematic research on speeding up database search engines for protein identification and illustrate the key points with the specific design of the pFind 2.1 search engine as a running example. Firstly, by constructing peptide indexes, pFind achieves a speedup of two to three compared with that without peptide indexes. Secondly, by constructing indexes for observed precursor and fragment ions, pFind achieves another speedup of two. As a result, pFind compares very favorably with predominant search engines such as Mascot, SEQUEST and X!Tandem.


Proteomics | 2012

pParse: A method for accurate determination of monoisotopic peaks in high-resolution mass spectra

Zuo-Fei Yuan; Chao Liu; Haipeng Wang; Rui-Xiang Sun; Yan Fu; Jingfen Zhang; Le-Heng Wang; Hao Chi; You Li; Li-Yun Xiu; Wenping Wang; Simin He

Determining the monoisotopic peak of a precursor is a first step in interpreting mass spectra, which is basic but non‐trivial. The reason is that in the isolation window of a precursor, other peaks interfere with the determination of the monoisotopic peak, leading to wrong mass‐to‐charge ratio or charge state. Here we propose a method, named pParse, to export the most probable monoisotopic peaks for precursors, including co‐eluted precursors. We use the relationship between the position of the highest peak and the mass of the first peak to detect candidate clusters. Then, we extract three features to sort the candidate clusters: (i) the sum of the intensity, (ii) the similarity of the experimental and the theoretical isotopic distribution, and (iii) the similarity of elution profiles. We showed that the recall of pParse, MaxQuant, and BioWorks was 98–98.8%, 0.5–17%, and 1.8–36.5% at the same precision, respectively. About 50% of tandem mass spectra are triggered by multiple precursors which are difficult to identify. Then we design a new scoring function to identify the co‐eluted precursors. About 26% of all identified peptides were exclusively from co‐eluted peptides. Therefore, accurately determining monoisotopic peaks, including co‐eluted precursors, can greatly increase peptide identification rate.


BMC Bioinformatics | 2009

Efficient discovery of abundant post-translational modifications and spectral pairs using peptide mass and retention time differences

Yan Fu; Wei Jia; Zhuang Lu; Haipeng Wang; Zuo-Fei Yuan; Hao Chi; You Leo Li; Li-Yun Xiu; Wenping Wang; Chao Liu; Le-Heng Wang; Rui-Xiang Sun; Wen Gao; Xiaohong Qian; Simin He

BackgroundPeptide identification via tandem mass spectrometry is the basic task of current proteomics research. Due to the complexity of mass spectra, the majority of mass spectra cannot be interpreted at present. The existence of unexpected or unknown protein post-translational modifications is a major reason.ResultsThis paper describes an efficient and sequence database-independent approach to detecting abundant post-translational modifications in high-accuracy peptide mass spectra. The approach is based on the observation that the spectra of a modified peptide and its unmodified counterpart are correlated with each other in their peptide masses and retention time. Frequently occurring peptide mass differences in a data set imply possible modifications, while small and consistent retention time differences provide orthogonal supporting evidence. We propose to use a bivariate Gaussian mixture model to discriminate modification-related spectral pairs from random ones. Due to the use of two-dimensional information, accurate modification masses and confident spectral pairs can be determined as well as the quantitative influences of modifications on peptide retention time.ConclusionExperiments on two glycoprotein data sets demonstrate that our method can effectively detect abundant modifications and spectral pairs. By including the discovered modifications into database search or by propagating peptide assignments between paired spectra, an average of 10% more spectra are interpreted.


Rapid Communications in Mass Spectrometry | 2010

An efficient parallelization of phosphorylated peptide and protein identification

Le-Heng Wang; Wenping Wang; Hao Chi; Yanjie Wu; You Li; Yan Fu; Chen Zhou; Rui-Xiang Sun; Haipeng Wang; Chao Liu; Zuo-Fei Yuan; Li-Yun Xiu; Simin He

Protein sequence database search based on tandem mass spectrometry is an essential method for protein identification. As the computational demand increases, parallel computing has become an important technique for accelerating proteomics data analysis. In this paper, we discuss several factors which could affect the runtime of the pFind search engine and build an estimation model. Based on this model, effective on-line and off-line scheduling methods were developed. An experiment on the public dataset from PhosphoPep consisting of 100 RAW files of phosphopeptides shows that the speedup on 100 processors is 83.7. The parallel version can complete the identification task within 9 min, while a stand-alone process on a single PC takes more than 10 h. On another larger dataset consisting of 1,366,471 spectra, the speedup on 320 processors is 258.9 and the efficiency is 80.9%. Our approach can be applied to other similar search engines.


BMC Bioinformatics | 2010

Speeding up tandem mass spectrometry-based database searching by longest common prefix

Chen Zhou; Hao Chi; Le-Heng Wang; You Li; Yanjie Wu; Yan Fu; Rui-Xiang Sun; Simin He

BackgroundTandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use.ResultsWe developed an algorithm based on the longest common prefix (ABLCP) to efficiently organize a protein sequence database. The longest common prefix is a data structure that is always coupled to the suffix array. It eliminates redundant candidate peptides in databases and reduces the corresponding peptide-spectrum matching times, thereby decreasing the identification time. This algorithm is based on the property of the longest common prefix. Even enzymatic digestion poses a challenge to this property, but some adjustments can be made to this algorithm to ensure that no candidate peptides are omitted. Compared with peptide indexing, ABLCP requires much less time and space for construction and is subject to fewer restrictions.ConclusionsThe ABLCP algorithm can help to improve data analysis efficiency. A software tool implementing this algorithm is available at http://pfind.ict.ac.cn/pfind2dot5/index.htm


Rapid Communications in Mass Spectrometry | 2007

pFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry.

Le-Heng Wang; Dequan Li; Yan Fu; Haipeng Wang; Jingfen Zhang; Zuo-Fei Yuan; Rui-Xiang Sun; Rong Zeng; Simin He; Wen Gao

Collaboration


Dive into the Le-Heng Wang's collaboration.

Top Co-Authors

Avatar

Hao Chi

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Simin He

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Rui-Xiang Sun

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yan Fu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Zuo-Fei Yuan

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Haipeng Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Chao Liu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Li-Yun Xiu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

You Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Meng-Qiu Dong

Scripps Research Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge