Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Rui-Xiang Sun is active.

Publication


Featured researches published by Rui-Xiang Sun.


Journal of Proteome Research | 2010

pNovo: De novo Peptide Sequencing and Identification Using HCD Spectra

Hao Chi; Rui-Xiang Sun; Bing Yang; Chun-Qing Song; Le-Heng Wang; Chao Liu; Yan Fu; Zuo-Fei Yuan; Haipeng Wang; Simin He; Meng-Qiu Dong

De novo peptide sequencing has improved remarkably in the past decade as a result of better instruments and computational algorithms. However, de novo sequencing can correctly interpret only approximately 30% of high- and medium-quality spectra generated by collision-induced dissociation (CID), which is much less than database search. This is mainly due to incomplete fragmentation and overlap of different ion series in CID spectra. In this study, we show that higher-energy collisional dissociation (HCD) is of great help to de novo sequencing because it produces high mass accuracy tandem mass spectrometry (MS/MS) spectra without the low-mass cutoff associated with CID in ion trap instruments. Besides, abundant internal and immonium ions in the HCD spectra can help differentiate similar peptide sequences. Taking advantage of these characteristics, we developed an algorithm called pNovo for efficient de novo sequencing of peptides from HCD spectra. pNovo gave correct identifications to 80% or more of the HCD spectra identified by database search. The number of correct full-length peptides sequenced by pNovo is comparable with that obtained by database search. A distinct advantage of de novo sequencing is that deamidated peptides and peptides with amino acid mutations can be identified efficiently without extra cost in computation. In summary, implementation of the HCD characteristics makes pNovo an excellent tool for de novo peptide sequencing from HCD spectra.


Bioinformatics | 2004

Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry

Yan Fu; Qiang Yang; Rui-Xiang Sun; Dequan Li; Rong Zeng; Charles X. Ling; Wen Gao

MOTIVATION The correlation among fragment ions in a tandem mass spectrum is crucial in reducing stochastic mismatches for peptide identification by database searching. Until now, an efficient scoring algorithm that considers the correlative information in a tunable and comprehensive manner has been lacking. RESULTS This paper provides a promising approach to utilizing the correlative information for improving the peptide identification accuracy. The kernel trick, rooted in the statistical learning theory, is exploited to address this issue with low computational effort. The common scoring method, the tandem mass spectral dot product (SDP), is extended to the kernel SDP (KSDP). Experiments on a dataset reported previously demonstrate the effectiveness of the KSDP. The implementation on consecutive fragments shows a decrease of 10% in the error rate compared with the SDP. Our software tool, pFind, using a simple scoring function based on the KSDP, outperforms two SDP-based software tools, SEQUEST and Sonar MS/MS, in terms of identification accuracy. SUPPLEMENTARY INFORMATION http://www.jdl.ac.cn/user/yfu/pfind/index.html


Bioinformatics | 2005

pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry

Dequan Li; Yan Fu; Rui-Xiang Sun; Charles X. Ling; Yonggang Wei; Hu Zhou; Rong Zeng; Qiang Yang; Simin He; Wen Gao

SUMMARY Research in proteomics requires powerful database-searching software to automatically identify protein sequences in a complex protein mixture via tandem mass spectrometry. In this paper, we describe a novel database-searching software system called pFind (peptide/protein Finder), which employs an effective peptide-scoring algorithm that we reported earlier. The pFind server is implemented with the C++ STL, .Net and XML technologies. As a result, high speed and good usability of the software are achieved.


Bioinformatics | 2010

Open MS/MS spectral library search to identify unanticipated post-translational modifications and increase spectral identification rate

Ding Ye; Yan Fu; Rui-Xiang Sun; Haipeng Wang; Zuo-Fei Yuan; Hao Chi; Simin He

Motivation: Identification of post-translationally modified proteins has become one of the central issues of current proteomics. Spectral library search is a new and promising computational approach to mass spectrometry-based protein identification. However, its potential in identification of unanticipated post-translational modifications has rarely been explored. The existing spectral library search tools are designed to match the query spectrum to the reference library spectra with the same peptide mass. Thus, spectra of peptides with unanticipated modifications cannot be identified. Results: In this article, we present an open spectral library search tool, named pMatch. It extends the existing library search algorithms in at least three aspects to support the identification of unanticipated modifications. First, the spectra in library are optimized with the full peptide sequence information to better tolerate the peptide fragmentation pattern variations caused by some modification(s). Second, a new scoring system is devised, which uses charge-dependent mass shifts for peak matching and combines a probability-based model with the general spectral dot-product for scoring. Third, a target-decoy strategy is used for false discovery rate control. To demonstrate the effectiveness of pMatch, a library search experiment was conducted on a public dataset with over 40 000 spectra in comparison with SpectraST, the most popular library search engine. Additional validations were done on four published datasets including over 150 000 spectra. The results showed that pMatch can effectively identify unanticipated modifications and significantly increase spectral identification rate. Availability: http://pfind.ict.ac.cn/pmatch/ Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Journal of Proteome Research | 2013

pNovo+: De Novo Peptide Sequencing Using Complementary HCD and ETD Tandem Mass Spectra

Hao Chi; Hai-Feng Chen; Kun He; Long Wu; Bing Yang; Rui-Xiang Sun; Jianyun Liu; Wen-Feng Zeng; Chun-Qing Song; Simin He; Meng-Qiu Dong

De novo peptide sequencing is the only tool for extracting peptide sequences directly from tandem mass spectrometry (MS) data without any protein database. However, neither the accuracy nor the efficiency of de novo sequencing has been satisfactory, mainly due to incomplete fragmentation information in experimental spectra. Recent advancement in MS technology has enabled acquisition of higher energy collisional dissociation (HCD) and electron transfer dissociation (ETD) spectra of the same precursor. These spectra contain complementary fragmentation information and can be collected with high resolution and high mass accuracy. Taking these advantages, we have developed a new algorithm called pNovo+, which greatly improves the accuracy and speed of de novo sequencing. On tryptic peptides, 86% of the topmost candidate sequences deduced by pNovo+ from HCD + ETD spectral pairs matched the database search results, and the success rate reached 95% if the top three candidates were included, which was much higher than using only HCD (87%) or only ETD spectra (57%). On Asp-N, Glu-C, or Elastase digested peptides, 69-87% of the HCD + ETD spectral pairs were correctly identified by pNovo+ among the topmost candidates, or 84-95% among the top three. On average, it takes pNovo+ only 0.018 s to extract the sequence from a spectrum or spectral pair on a common personal computer. This is more than three times as fast as other de novo sequencing programs. The increase of speed is mainly due to pDAG, a component algorithm of pNovo+. pDAG finds the k longest paths in a directed acyclic graph without the antisymmetry restriction. We have verified that the antisymmetry restriction is unnecessary for high resolution, high mass accuracy data. The extensive use of HCD and ETD spectral information and the pDAG algorithm make pNovo+ an excellent de novo sequencing tool.


Nature Methods | 2015

Mapping native disulfide bonds at a proteome scale

Shan Lu; Sheng-Bo Fan; Bing Yang; Yu-Xin Li; Jia‐Ming Meng; Long Wu; Pin Li; Kun Zhang; Mei-Jun Zhang; Yan Fu; Jincai Luo; Rui-Xiang Sun; Simin He; Meng-Qiu Dong

We developed a high-throughput mass spectrometry method, pLink-SS (http://pfind.ict.ac.cn/software/pLink/2014/pLink-SS.html), for precise identification of disulfide-linked peptides. Using pLink-SS, we mapped all native disulfide bonds of a monoclonal antibody and ten standard proteins. We performed disulfide proteome analyses and identified 199 disulfide bonds in Escherichia coli and 568 in proteins secreted by human endothelial cells. We discovered many regulatory disulfide bonds involving catalytic or metal-binding cysteine residues.


Journal of Proteome Research | 2010

Improved Peptide Identification for Proteomic Analysis Based on Comprehensive Characterization of Electron Transfer Dissociation Spectra

Rui-Xiang Sun; Meng-Qiu Dong; Chun-Qing Song; Hao Chi; Bing Yang; Li-Yun Xiu; Li Tao; Zhi-Yi Jing; Chao Liu; Le-Heng Wang; Yan Fu; Simin He

In recent years, electron transfer dissociation (ETD) has enjoyed widespread applications from sequencing of peptides with or without post-translational modifications to top-down analysis of intact proteins. However, peptide identification rates from ETD spectra compare poorly with those from collision induced dissociation (CID) spectra, especially for doubly charged precursors. This is in part due to an insufficient understanding of the characteristics of ETD and consequently a failure of database search engines to make use of the rich information contained in the ETD spectra. In this study, we statistically characterized ETD fragmentation patterns from a collection of 461 440 spectra and subsequently implemented our findings into pFind, a database search engine developed earlier for CID data. From ETD spectra of doubly charged precursors, pFind 2.1 identified 63-122% more unique peptides than Mascot 2.2 under the same 1% false discovery rate. For higher charged peptides as well as phosphopeptides, pFind 2.1 also consistently obtained more identifications. Of the features built into pFind 2.1, the following two greatly enhanced its performance: (1) refined automatic detection and removal of high-intensity peaks belonging to the precursor, charge-reduced precursor, or related neutral loss species, whose presence often set spectral matching askew; (2) a thorough consideration of hydrogen-rearranged fragment ions such as z + H and c - H for peptide precursors of different charge states. Our study has revealed that different charge states of precursors result in different hydrogen rearrangement patterns. For a fragment ion, its propensity of gaining or losing a hydrogen depends on (1) the ion type (c or z) and (2) the size of the fragment relative to the precursor, and both dependencies are affected by (3) the charge state of the precursor. In addition, we discovered ETD characteristics that are unique for certain types of amino acids (AAs), such as a prominent neutral loss of SCH(2)CONH(2) (90.0014 Da) from z ions with a carbamidomethylated cysteine at the N-terminus and a neutral loss of histidine side chain C(4)N(2)H(5) (81.0453 Da) from precursor ions containing histidine. The comprehensive list of ETD characteristics summarized in this paper should be valuable for automated database search, de novo peptide sequencing, and manual spectral validation.


Rapid Communications in Mass Spectrometry | 2010

Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing

You Li; Hao Chi; Le-Heng Wang; Haipeng Wang; Yan Fu; Zuo-Fei Yuan; Su-Jun Li; Yan-Sheng Liu; Rui-Xiang Sun; Rong Zeng; Simin He

Database searching is the technique of choice for shotgun proteomics, and to date much research effort has been spent on improving its effectiveness. However, database searching faces a serious challenge of efficiency, considering the large numbers of mass spectra and the ever fast increase in peptide databases resulting from genome translations, enzymatic digestions, and post-translational modifications. In this study, we conducted systematic research on speeding up database search engines for protein identification and illustrate the key points with the specific design of the pFind 2.1 search engine as a running example. Firstly, by constructing peptide indexes, pFind achieves a speedup of two to three compared with that without peptide indexes. Secondly, by constructing indexes for observed precursor and fragment ions, pFind achieves another speedup of two. As a result, pFind compares very favorably with predominant search engines such as Mascot, SEQUEST and X!Tandem.


Molecular & Cellular Proteomics | 2011

DeltAMT: a statistical algorithm for fast detection of protein modifications from LC-MS/MS data

Yan Fu; Li-Yun Xiu; Wei Jia; Ding Ye; Rui-Xiang Sun; Xiaohong Qian; Simin He

Identification of proteins and their modifications via liquid chromatography-tandem mass spectrometry is an important task for the field of proteomics. However, because of the complexity of tandem mass spectra, the majority of the spectra cannot be identified. The presence of unanticipated protein modifications is among the major reasons for the low spectral identification rate. The conventional database search approach to protein identification has inherent difficulties in comprehensive detection of protein modifications. In recent years, increasing efforts have been devoted to developing unrestrictive approaches to modification identification, but they often suffer from their lack of speed. This paper presents a statistical algorithm named DeltAMT (Delta Accurate Mass and Time) for fast detection of abundant protein modifications from tandem mass spectra with high-accuracy precursor masses. The algorithm is based on the fact that the modified and unmodified versions of a peptide are usually present simultaneously in a sample and their spectra are correlated with each other in precursor masses and retention times. By representing each pair of spectra as a delta mass and time vector, bivariate Gaussian mixture models are used to detect modification-related spectral pairs. Unlike previous approaches to unrestrictive modification identification that mainly rely upon the fragment information and the mass dimension in liquid chromatography-tandem mass spectrometry, the proposed algorithm makes the most of precursor information. Thus, it is highly efficient while being accurate and sensitive. On two published data sets, the algorithm effectively detected various modifications and other interesting events, yielding deep insights into the data. Based on these discoveries, the spectral identification rates were significantly increased and many modified peptides were identified.


Scientific Reports | 2016

pGlyco: a pipeline for the identification of intact N-glycopeptides by using HCD- and CID-MS/MS and MS3

Wen-Feng Zeng; Mingqi Liu; Yang Zhang; Jian-Qiang Wu; Pan Fang; Chao Peng; Aiying Nie; Guoquan Yan; Weiqian Cao; Chao Liu; Hao Chi; Rui-Xiang Sun; Catherine C. L. Wong; Simin He; Pengyuan Yang

Confident characterization of the microheterogeneity of protein glycosylation through identification of intact glycopeptides remains one of the toughest analytical challenges for glycoproteomics. Recently proposed mass spectrometry (MS)-based methods still have some defects such as lack of the false discovery rate (FDR) analysis for the glycan identification and lack of sufficient fragmentation information for the peptide identification. Here we proposed pGlyco, a novel pipeline for the identification of intact glycopeptides by using complementary MS techniques: 1) HCD-MS/MS followed by product-dependent CID-MS/MS was used to provide complementary fragments to identify the glycans, and a novel target-decoy method was developed to estimate the false discovery rate of the glycan identification; 2) data-dependent acquisition of MS3 for some most intense peaks of HCD-MS/MS was used to provide fragments to identify the peptide backbones. By integrating HCD-MS/MS, CID-MS/MS and MS3, intact glycopeptides could be confidently identified. With pGlyco, a standard glycoprotein mixture was analyzed in the Orbitrap Fusion, and 309 non-redundant intact glycopeptides were identified with detailed spectral information of both glycans and peptides.

Collaboration


Dive into the Rui-Xiang Sun's collaboration.

Top Co-Authors

Avatar

Simin He

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yan Fu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Hao Chi

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Chao Liu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Le-Heng Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Haipeng Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Zuo-Fei Yuan

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Meng-Qiu Dong

Scripps Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wen-Feng Zeng

Chinese Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge