Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yong Fuga Li is active.

Publication


Featured researches published by Yong Fuga Li.


Evolution | 2008

REVERSE ECOLOGY AND THE POWER OF POPULATION GENOMICS

Yong Fuga Li; James C. Costello; Alisha K. Holloway; Matthew W. Hahn

Abstract Rapid and inexpensive sequencing technologies are making it possible to collect whole genome sequence data on multiple individuals from a population. This type of data can be used to quickly identify genes that control important ecological and evolutionary phenotypes by finding the targets of adaptive natural selection, and we therefore refer to such approaches as “reverse ecology.” To quantify the power gained in detecting positive selection using population genomic data, we compare three statistical methods for identifying targets of selection: the McDonald–Kreitman test, the mkprf method, and a likelihood implementation for detecting dN/dS > 1. Because the first two methods use polymorphism data we expect them to have more power to detect selection. However, when applied to population genomic datasets from human, fly, and yeast, the tests using polymorphism data were actually weaker in two of the three datasets. We explore reasons why the simpler comparative method has identified more genes under selection, and suggest that the different methods may really be detecting different signals from the same sequence data. Finally, we find several statistical anomalies associated with the mkprf method, including an almost linear dependence between the number of positively selected genes identified and the prior distributions used. We conclude that interpreting the results produced by this method should be done with some caution.


Journal of Computational Biology | 2009

A bayesian approach to protein inference problem in shotgun proteomics.

Yong Fuga Li; Randy J. Arnold; Yixue Li; Predrag Radivojac; Quanhu Sheng; Haixu Tang

The protein inference problem represents a major challenge in shotgun proteomics. In this article, we describe a novel Bayesian approach to address this challenge by incorporating the predicted peptide detectabilities as the prior probabilities of peptide identification. We propose a rigorious probabilistic model for protein inference and provide practical algoritmic solutions to this problem. We used a complex synthetic protein mixture to test our method and obtained promising results.


european symposium on research in computer security | 2011

To release or not to release: evaluating information leaks in aggregate human-genome data

Xiaoyong Zhou; Bo Peng; Yong Fuga Li; Yangyi Chen; Haixu Tang; XiaoFeng Wang

The rapid progress of human genome studies leads to a strong demand of aggregate human DNA data (e.g, allele frequencies, test statistics, etc.), whose public dissemination, however, has been impeded by privacy concerns. Prior research shows that it is possible to identify the presence of some participants in a study from such data, and in some cases, even fully recover their DNA sequences. A critical issue, therefore, becomes how to evaluate such a risk on individual data-sets and determine when they are safe to release. In this paper, we report our research that makes the first attempt to address this issue. We first identified the space of the aggregate-data-release problem, through examining common types of aggregate data and the typical threats they are facing. Then, we performed an in-depth study on different scenarios of attacks on different types of data, which sheds light on several fundamental questions in this problem domain. Particularly, we found that attacks on aggregate data are difficult in general, as the adversary often does not have enough information and needs to solve NP-complete or NPhard problems. On the other hand, we acknowledge that the attacks can succeed under some circumstances, particularly, when the solution space of the problem is small. Based upon such an understanding, we propose a risk-scale system and a methodology to determine when to release an aggregate data-set and when not to. We also used real human-genome data to verify our findings.


BMC Bioinformatics | 2012

Computational approaches to protein inference in shotgun proteomics

Yong Fuga Li; Predrag Radivojac

Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programing and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area.


Journal of Proteome Research | 2010

The Importance of Peptide Detectability for Protein Identification, Quantification, and Experiment Design in MS/MS Proteomics

Yong Fuga Li; Randy J. Arnold; Haixu Tang; Predrag Radivojac

Peptide detectability is defined as the probability that a peptide is identified in an LC-MS/MS experiment and has been useful in providing solutions to protein inference and label-free quantification. Previously, predictors for peptide detectability trained on standard or complex samples were proposed. Although the models trained on complex samples may benefit from the large training data sets, it is unclear to what extent they are affected by the unequal abundances of identified proteins. To address this challenge and improve detectability prediction, we present a new algorithm for the iterative learning of peptide detectability from complex mixtures. We provide evidence that the new method approximates detectability with useful accuracy and, based on its design, can be used to interpret the outcome of other learning strategies. We studied the properties of peptides from the bacterium Deinococcus radiodurans and found that at standard quantities, its tryptic peptides can be roughly classified as either detectable or undetectable, with a relatively small fraction having medium detectability. We extend the concept of detectability from peptides to proteins and apply the model to predict the behavior of a replicate LC-MS/MS experiment from a single analysis. Finally, our study summarizes a theoretical framework for peptide/protein identification and label-free quantification.


research in computational molecular biology | 2008

A Bayesian approach to protein inference problem in shotgun proteomics

Yong Fuga Li; Randy J. Arnold; Yixue Li; Predrag Radivojac; Quanhu Sheng; Haixu Tang

The protein inference problem represents a major challenge in shotgun proteomics. Here we describe a novel Bayesian approach to address this challenge that incorporates the predicted peptide detectabilities as the prior probabilities of peptide identification. Our model removes some unrealistic assumptions used in previous approaches and provides a rigorious probabilistic solution to this problem. We used a complex synthetic protein mixture to test our method, and obtained promising results.


Analytical Chemistry | 2010

Combinatorial Libraries of Synthetic Peptides as a Model for Shotgun Proteomics

Brian C. Bohrer; Yong Fuga Li; James P. Reilly; David E. Clemmer; Richard D. DiMarchi; Predrag Radivojac; Haixu Tang; Randy J. Arnold

A synthetic approach to model the analytical complexity of biological proteolytic digests has been developed. Combinatorial peptide libraries ranging in length between 9 and 12 amino acids that represent typical tryptic digests were designed, synthesized, and analyzed. Individual libraries and mixtures thereof were studied by replicate liquid chromatography-ion trap mass spectrometry and compared to a tryptic digest of Deinococcus radiodurans. Similar to complex proteome analysis, replicate study of individual libraries identified additional unique peptides. Fewer novel sequences were revealed with each additional analysis in a manner similar to that observed for biological data. Our results demonstrate a bimodal distribution of peptides sorting to either very low or very high levels of detection. Upon mixing of libraries at equal abundance, a length-dependent bias in favor of longer sequence identification was observed. Peptide identification as a function of site-specific amino acid content was characterized with certain amino acids proving to be of considerable importance. This report demonstrates that peptide libraries of defined character can serve as a reference for instrument characterization. Furthermore, they are uniquely suited to delineate the physical properties that influence identification of peptides, which provides a foundation for optimizing the study of samples with less defined heterogeneity.


BMC Bioinformatics | 2010

Structure-based kernels for the prediction of catalytic residues and their involvement in human inherited disease

Fuxiao Xin; Steven Myers; Yong Fuga Li; David Neil Cooper; Sean D. Mooney; Predrag Radivojac

Background Enzyme catalysis is involved in numerous biological processes and the disruption of enzymatic activity has been implicated in human disease. Despite the functional importance, various aspects of catalytic reactions are not completely understood, such as the mechanics of reaction chemistry and the geometry of catalytic residues within active sites. As a result, the computational prediction of catalytic residues has the potential to identify novel catalytic pockets, aid in the design of more efficient enzymes and also predict the molecular basis of disease.


computer and communications security | 2009

Learning your identity and disease from research papers: information leaks in genome wide association study

Rui Wang; Yong Fuga Li; XiaoFeng Wang; Haixu Tang; Xiaoyong Zhou


Statistics and Its Interface | 2012

Protein identification problem from a Bayesian point of view.

Yong Fuga Li; Randy J. Arnold; Predrag Radivojac; Haixu Tang

Collaboration


Dive into the Yong Fuga Li's collaboration.

Top Co-Authors

Avatar

Haixu Tang

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

Predrag Radivojac

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

Randy J. Arnold

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

Brian C. Bohrer

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Quanhu Sheng

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

XiaoFeng Wang

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

Xiaoyong Zhou

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

Yixue Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge