Fengchao Yu
Hong Kong University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fengchao Yu.
BMC Bioinformatics | 2016
Fengchao Yu; Ning Li; Weichuan Yu
BackgroundChemical cross-linking combined with mass spectrometry (CX-MS) is a high-throughput approach to studying protein-protein interactions. The number of peptide-peptide combinations grows quadratically with respect to the number of proteins, resulting in a high computational complexity. Widely used methods including xQuest (Rinner et al., Nat Methods 5(4):315–8, 2008; Walzthoeni et al., Nat Methods 9(9):901–3, 2012), pLink (Yang et al., Nat Methods 9(9):904–6, 2012), ProteinProspector (Chu et al., Mol Cell Proteomics 9:25–31, 2010; Trnka et al., 13(2):420–34, 2014) and Kojak (Hoopmann et al., J Proteome Res 14(5):2190–198, 2015) avoid searching all peptide-peptide combinations by pre-selecting peptides with heuristic approaches. However, pre-selection procedures may cause missing findings. The most intuitive approach is searching all possible candidates. A tool that can exhaustively search a whole database without any heuristic pre-selection procedure is therefore desirable.ResultsWe have developed a cross-linked peptides identification tool named ECL. It can exhaustively search a whole database in a reasonable period of time without any heuristic pre-selection procedure. Tests showed that searching a database containing 5200 proteins took 7 h.ECL identified more non-redundant cross-linked peptides than xQuest, pLink, and ProteinProspector. Experiments showed that about 30 % of these additional identified peptides were not pre-selected by Kojak. We used protein crystal structures from the protein data bank to check the intra-protein cross-linked peptides. Most of the distances between cross-linking sites were smaller than 30 Å.ConclusionsTo the best of our knowledge, ECL is the first tool that can exhaustively search all candidates in cross-linked peptides identification. The experiments showed that ECL could identify more peptides than xQuest, pLink, and ProteinProspector. A further analysis indicated that some of the additional identified results were thanks to the exhaustive search.
Proteomics | 2016
Xinliang Zhu; Fengchao Yu; Zhu Yang; Shichang Liu; Chen Dai; Xiaoyun Lu; Chenyu Liu; Weichuan Yu; Ning Li
Site‐specific chemical cross‐linking in combination with mass spectrometry analysis has emerged as a powerful proteomic approach for studying the three‐dimensional structure of protein complexes and in mapping protein–protein interactions (PPIs). Building on the success of MS analysis of in vitro cross‐linked proteins, which has been widely used to investigate specific interactions of bait proteins and their targets in various organisms, we report a workflow for in vivo chemical cross‐linking and MS analysis in a multicellular eukaryote. This approach optimizes the in vivo protein cross‐linking conditions in Arabidopsis thaliana, establishes a MudPIT procedure for the enrichment of cross‐linked peptides, and develops an integrated software program, exhaustive cross‐linked peptides identification tool (ECL), to identify the MS spectra of in planta chemical cross‐linked peptides. In total, two pairs of in vivo cross‐linked peptides of high confidence have been identified from two independent biological replicates. This work demarks the beginning of an alternative proteomic approach in the study of in vivo protein tertiary structure and PPIs in multicellular eukaryotes.
Journal of Proteome Research | 2017
Fengchao Yu; Ning Li; Weichuan Yu
Chemical cross-linking coupled to mass spectrometry is a powerful tool to study protein-protein interactions and protein conformations. Two linked peptides are ionized and fragmented to produce a tandem mass spectrum. In such an experiment, a tandem mass spectrum contains ions from two peptides. The peptide identification problem becomes a peptide-peptide pair identification problem. Currently, most tools do not search all possible pairs due to the quadratic time complexity. Consequently, missed findings are unavoidable. In our previous work, we developed a tool named ECL to search all pairs of peptides exhaustively. Unfortunately, it is very slow due to the quadratic computational complexity, especially when the database is large. Furthermore, ECL uses a score function without statistical calibration, while researchers1-3 have proposed that it is inappropriate to directly compare uncalibrated scores because different spectra have different random score distributions. Here we propose an advanced version of ECL, named ECL2. It achieves a linear time and space complexity by taking advantage of the additive property of a score function. It can search a data set containing tens of thousands of spectra against a database containing thousands of proteins in a few hours. Comparison with other five state-of-the-art tools shows that ECL2 is much faster than pLink, StavroX, ProteinProspector, and ECL. Kojak is the only one that is faster than ECL2, but Kojak does not exhaustively search all possible peptide pairs. The comparison shows that ECL2 has the highest sensitivity among the state-of-the-art tools. The experiment using a large-scale in vivo cross-linking data set demonstrates that ECL2 is the only tool that can find the peptide-spectrum matches (PSMs) passing the false discovery rate/q-value threshold. The result illustrates that the exhaustive search and a well-calibrated score function are useful to find PSMs from a huge search space.
international conference on image processing | 2011
Fengchao Yu; Huafeng Liu; Pengcheng Shi
In this paper, we explore the usage of graphics processing units (GPU)-accelerated particle filter strategies for the estimation of activity map in tomographic PET imaging. The proposed framework formulates the physiological model of the imaging tissues through state space evolution equations and the photon counting statistics through observation equations, and then reconstruction is performed using particle filter estimation. For fast reconstruction, the calculations are implemented using highly-parallel GPU. Experiments show that the image quality is improved through particle filter. Furthermore, thanks to the computing power of graphics hardware, reconstruction times are practical for clinical applications.
ieee embs international conference on biomedical and health informatics | 2012
Fengchao Yu; Huafeng Liu; Pengcheng Shi
PET measured data in nature follows Poisson distribution, which leads to iterative statistical methods being the primary efforts in image reconstruction. In contrast, physiological model provides a predictive tool of the imaged biological processes. To date, most of the existing efforts do not attempt to tackle the reconstruction problem by combining measured data statistics and physiological modeling constraints in a joint fashion. In this paper, we propose a novel approach that combine the statistical model and physiological parameter together during static reconstruction with the aid of particle filter. Experiments on Monte Carlo simulations, real physical phantom data demonstrate the power of the framework.
bioRxiv | 2018
Jiaan Dai; Fengchao Yu; Ning Li; Weichuan Yu
Motivation Analyzing tandem mass spectrometry data to recognize peptides in a sample is the fundamental task in computational proteomics. Traditional peptide identification algorithms perform well when identifying unmodified peptides. However, when peptides have post-translational modifications (PTMs), these methods cannot provide satisfactory results. Recently, Chick et al., 2015 and Yu et al., 2016 proposed the spectrum-based and tag-based open search methods, respectively, to identify peptides with PTMs. While the performance of these two methods is promising, the identification results vary greatly with respect to the quality of tandem mass spectra and the number of PTMs in peptides. This motivates us to systematically study the relationship between the performance of open search methods and quality parameters of tandem mass spectrum data, as well as the number of PTMs in peptides. Results Through large-scale simulations, we obtain the performance trend when simulated tandem mass spectra are of different quality. We propose an analytical model to describe the relationship between the probability of obtaining correct identifications and the spectrum quality as well as the number of PTMs. Based on the analytical model, we can quantitatively describe the necessary condition to effectively apply open search methods. Availability Source codes of the simulation are available at http://bioinformatics.ust.hk/PST.html. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.
Journal of Proteome Research | 2018
Shichang Liu; Fengchao Yu; Qin Hu; Tingliang Wang; Lujia Yu; Shengwang Du; Weichuan Yu; Ning Li
An in planta chemical cross-linking-based quantitative interactomics (IPQCX-MS) workflow has been developed to investigate in vivo protein-protein interactions and alteration in protein structures in a model organism, Arabidopsis thaliana. A chemical cross-linker, azide-tag-modified disuccinimidyl pimelate (AMDSP), was directly applied onto Arabidopsis tissues. Peptides produced from protein fractions of CsCl density gradient centrifugation were dimethyl-labeled, from which the AMDSP cross-linked peptides were fractionated on chromatography, enriched, and analyzed by mass spectrometry. ECL2 and SQUA-D software were used to identify and quantitate these cross-linked peptides, respectively. These computer programs integrate peptide identification with quantitation and statistical evaluation. This workflow eventually identified 354 unique cross-linked peptides, including 61 and 293 inter- and intraprotein cross-linked peptides, respectively, demonstrating that it is able to in vivo identify hundreds of cross-linked peptides at an organismal level by overcoming the difficulties caused by multiple cellular structures and complex secondary metabolites of plants. Coimmunoprecipitation and super-resolution microscopy studies have confirmed the PHB3-PHB6 protein interaction found by IPQCX-MS. The quantitative interactomics also found hormone-induced structural changes of SBPase and other proteins. This mass-spectrometry-based interactomics will be useful in the study of in vivo protein-protein interaction networks in agricultural crops and plant-microbe interactions.
Bioinformatics | 2018
Jiaan Dai; Wei Jiang; Fengchao Yu; Weichuan Yu
Motivation: Cross‐linking technique coupled with mass spectrometry (MS) is widely used in the analysis of protein structures and protein‐protein interactions. In order to identify cross‐linked peptides from MS data, we need to consider all pairwise combinations of peptides, which is computationally prohibitive when the sequence database is large. To alleviate this problem, some heuristic screening strategies are used to reduce the number of peptide pairs during the identification. However, heuristic screening strategies may miss some true cross‐linked peptides. Results: We directly tackle the combination challenge without using any screening strategies. With the data structure of double‐ended queue, the proposed algorithm reduces the quadratic time complexity of exhaustive searching down to the linear time complexity. We implement the algorithm in a tool named Xolik. The running time of Xolik is validated using databases with different numbers of proteins. Experiments using synthetic and empirical datasets show that Xolik outperforms existing tools in terms of running time and statistical power. Availability and implementation: Source code and binaries of Xolik are freely available at http://bioinformatics.ust.hk/Xolik.html. Supplementary information: Supplementary data are available at Bioinformatics online.
bioRxiv | 2017
Fengchao Yu; Ning Li; Weichuan Yu
Chemical cross-linking coupled with mass spectrometry is a powerful tool to study protein-protein interactions and protein conformations. Two linked peptides are ionized and fragmented to produce a tandem mass spectrum. In such an experiment, a tandem mass spectrum contains ions from two peptides. The peptide identification problem becomes a peptide-peptide pair identification problem. Currently, most existing tools don’t search all possible pairs due to the quadratic time complexity. Consequently, a significant percentage of linked peptides are missed. In our earlier work, we developed a tool named ECL to search all pairs of peptides exhaustively. While ECL does not miss any linked peptides, it is very slow due to the quadratic computational complexity, especially when the database is large. Furthermore, ECL uses a score function without statistical calibration, while researchers1,2 have demonstrated that using a statistical calibrated score function can achieve a higher sensitivity than using an uncalibrated one. Here, we propose an advanced version of ECL, named ECL 2.0. It achieves a linear time and space complexity by taking advantage of the additive property of a score function. It can analyze a typical data set containing tens of thousands of spectra using a large-scale database containing thousands of proteins in a few hours. Comparison with other five state-of-the-art tools shows that ECL 2.0 is much faster than pLink, StavroX, ProteinProspector, and ECL. Kojak is the only one tool that is faster than ECL 2.0. But Kojak does not exhaustively search all possible peptide pairs. We also adopt an e-value estimation method to calibrate the original score. Comparison shows that ECL 2.0 has the highest sensitivity among the state-of-the-art tools. The experiment using a large-scale in vivo cross-linking data set demonstrates that ECL 2.0 is the only tool that can find PSMs passing the false discovery rate threshold. The result illustrates that exhaustive search and well calibrated score function are useful to find PSMs from a huge search space.
Journal of The Optical Society of America A-optics Image Science and Vision | 2012
Fengchao Yu; Huafeng Liu; Zhenghui Hu; Pengcheng Shi
As a consequence of the random nature of photon emissions and detections, the data collected by a positron emission tomography (PET) imaging system can be shown to be Poisson distributed. Meanwhile, there have been considerable efforts within the tracer kinetic modeling communities aimed at establishing the relationship between the PET data and physiological parameters that affect the uptake and metabolism of the tracer. Both statistical and physiological models are important to PET reconstruction. The majority of previous efforts are based on simplified, nonphysical mathematical expression, such as Poisson modeling of the measured data, which is, on the whole, completed without consideration of the underlying physiology. In this paper, we proposed a graphics processing unit (GPU)-accelerated reconstruction strategy that can take both statistical model and physiological model into consideration with the aid of state-space evolution equations. The proposed strategy formulates the organ activity distribution through tracer kinetics models and the photon-counting measurements through observation equations, thus making it possible to unify these two constraints into a general framework. In order to accelerate reconstruction, GPU-based parallel computing is introduced. Experiments of Zubal-thorax-phantom data, Monte Carlo simulated phantom data, and real phantom data show the power of the method. Furthermore, thanks to the computing power of the GPU, the reconstruction time is practical for clinical application.