Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Li Charlie Xia is active.

Publication


Featured researches published by Li Charlie Xia.


The ISME Journal | 2016

Correlation detection strategies in microbial data sets vary widely in sensitivity and precision

Sophie Weiss; Will Van Treuren; Catherine A. Lozupone; Karoline Faust; Jonathan Friedman; Ye Deng; Li Charlie Xia; Zhenjiang Zech Xu; Luke K. Ursell; Eric J. Alm; Amanda Birmingham; Jacob A. Cram; Jed A. Fuhrman; Jeroen Raes; Fengzhu Sun; Jizhong Zhou; Rob Knight

Disruption of healthy microbial communities has been linked to numerous diseases, yet microbial interactions are little understood. This is due in part to the large number of bacteria, and the much larger number of interactions (easily in the millions), making experimental investigation very difficult at best and necessitating the nascent field of computational exploration through microbial correlation networks. We benchmark the performance of eight correlation techniques on simulated and real data in response to challenges specific to microbiome studies: fractional sampling of ribosomal RNA sequences, uneven sampling depths, rare microbes and a high proportion of zero counts. Also tested is the ability to distinguish signals from noise, and detect a range of ecological and time-series relationships. Finally, we provide specific recommendations for correlation technique usage. Although some methods perform better than others, there is still considerable need for improvement in current techniques.


BMC Systems Biology | 2011

Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates

Li Charlie Xia; Joshua A. Steele; Jacob A. Cram; Zoe G. Cardon; Sheri L. Simmons; Joseph J. Vallino; Jed A. Fuhrman; Fengzhu Sun

BackgroundThe increasing availability of time series microbial community data from metagenomics and other molecular biological studies has enabled the analysis of large-scale microbial co-occurrence and association networks. Among the many analytical techniques available, the Local Similarity Analysis (LSA) method is unique in that it captures local and potentially time-delayed co-occurrence and association patterns in time series data that cannot otherwise be identified by ordinary correlation analysis. However LSA, as originally developed, does not consider time series data with replicates, which hinders the full exploitation of available information. With replicates, it is possible to understand the variability of local similarity (LS) score and to obtain its confidence interval.ResultsWe extended our LSA technique to time series data with replicates and termed it extended LSA, or eLSA. Simulations showed the capability of eLSA to capture subinterval and time-delayed associations. We implemented the eLSA technique into an easy-to-use analytic software package. The software pipeline integrates data normalization, statistical correlation calculation, statistical significance evaluation, and association network construction steps. We applied the eLSA technique to microbial community and gene expression datasets, where unique time-dependent associations were identified.ConclusionsThe extended LSA analysis technique was demonstrated to reveal statistically significant local and potentially time-delayed association patterns in replicated time series data beyond that of ordinary correlation analysis. These statistically significant associations can provide insights to the real dynamics of biological systems. The newly designed eLSA software efficiently streamlines the analysis and is freely available from the eLSA homepage, which can be accessed at http://meta.usc.edu/softs/lsa.


Bioinformatics | 2013

Efficient statistical significance approximation for local similarity analysis of high-throughput time series data

Li Charlie Xia; Dongmei Ai; Jacob A. Cram; Jed A. Fuhrman; Fengzhu Sun

MOTIVATION Local similarity analysis of biological time series data helps elucidate the varying dynamics of biological systems. However, its applications to large scale high-throughput data are limited by slow permutation procedures for statistical significance evaluation. RESULTS We developed a theoretical approach to approximate the statistical significance of local similarity analysis based on the approximate tail distribution of the maximum partial sum of independent identically distributed (i.i.d.) random variables. Simulations show that the derived formula approximates the tail distribution reasonably well (starting at time points > 10 with no delay and > 20 with delay) and provides P-values comparable with those from permutations. The new approach enables efficient calculation of statistical significance for pairwise local similarity analysis, making possible all-to-all local association studies otherwise prohibitive. As a demonstration, local similarity analysis of human microbiome time series shows that core operational taxonomic units (OTUs) are highly synergetic and some of the associations are body-site specific across samples. AVAILABILITY The new approach is implemented in our eLSA package, which now provides pipelines for faster local similarity analysis of time series data. The tool is freely available from eLSAs website: http://meta.usc.edu/softs/lsa. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT [email protected].


PLOS ONE | 2011

Accurate genome relative abundance estimation based on shotgun metagenomic reads.

Li Charlie Xia; Jacob A. Cram; Ting Chen; Jed A. Fuhrman; Fengzhu Sun

Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data- sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.


The ISME Journal | 2015

Cross-depth analysis of marine bacterial networks suggests downward propagation of temporal changes

Jacob A. Cram; Li Charlie Xia; David M. Needham; Rohan Sachdeva; Fengzhu Sun; Jed A. Fuhrman

Interactions among microbes and stratification across depths are both believed to be important drivers of microbial communities, though little is known about how microbial associations differ between and across depths. We have monitored the free-living microbial community at the San Pedro Ocean Time-series station, monthly, for a decade, at five different depths: 5 m, the deep chlorophyll maximum layer, 150 m, 500 m and 890 m (just above the sea floor). Here, we introduce microbial association networks that combine data from multiple ocean depths to investigate both within- and between-depth relationships, sometimes time-lagged, among microbes and environmental parameters. The euphotic zone, deep chlorophyll maximum and 890 m depth each contain two negatively correlated ‘modules’ (groups of many inter-correlated bacteria and environmental conditions) suggesting regular transitions between two contrasting environmental states. Two-thirds of pairwise correlations of bacterial taxa between depths lagged such that changes in the abundance of deeper organisms followed changes in shallower organisms. Taken in conjunction with previous observations of seasonality at 890 m, these trends suggest that planktonic microbial communities throughout the water column are linked to environmental conditions and/or microbial communities in overlying waters. Poorly understood groups including Marine Group A, Nitrospina and AEGEAN-169 clades contained taxa that showed diverse association patterns, suggesting these groups contain multiple ecological species, each shaped by different factors, which we have started to delineate. These observations build upon previous work at this location, lending further credence to the hypothesis that sinking particles and vertically migrating animals transport materials that significantly shape the time-varying patterns of microbial community composition.


BMC Bioinformatics | 2010

PPLook: an automated data mining tool for protein-protein interaction

Shao-Wu Zhang; Yao-Jun Li; Li Charlie Xia; Quan Pan

BackgroundExtracting and visualizing of protein-protein interaction (PPI) from text literatures are a meaningful topic in protein science. It assists the identification of interactions among proteins. There is a lack of tools to extract PPI, visualize and classify the results.ResultsWe developed a PPI search system, termed PPLook, which automatically extracts and visualizes protein-protein interaction (PPI) from text. Given a query protein name, PPLook can search a dataset for other proteins interacting with it by using a keywords dictionary pattern-matching algorithm, and display the topological parameters, such as the number of nodes, edges, and connected components. The visualization component of PPLook enables us to view the interaction relationship among the proteins in a three-dimensional space based on the OpenGL graphics interface technology. PPLook can also provide the functions of selecting protein semantic class, counting the number of semantic class proteins which interact with query protein, counting the literature number of articles appearing the interaction relationship about the query protein. Moreover, PPLook provides heterogeneous search and a user-friendly graphical interface.ConclusionsPPLook is an effective tool for biologists and biosystem developers who need to access PPI information from the literature. PPLook is freely available for non-commercial users at http://meta.usc.edu/softs/PPLook.


BMC Genomics | 2017

Integrated metagenomic data analysis demonstrates that a loss of diversity in oral microbiota is associated with periodontitis

Dongmei Ai; Ruocheng Huang; Jin Wen; Chao Li; Jiangping Zhu; Li Charlie Xia

BackgroundPeriodontitis is an inflammatory disease affecting the tissues supporting teeth (periodontium). Integrative analysis of metagenomic samples from multiple periodontitis studies is a powerful way to examine microbiota diversity and interactions within host oral cavity.MethodsA total of 43 subjects were recruited to participate in two previous studies profiling the microbial community of human subgingival plaque samples using shotgun metagenomic sequencing. We integrated metagenomic sequence data from those two studies, including six healthy controls, 14 sites representative of stable periodontitis, 16 sites representative of progressing periodontitis, and seven periodontal sites of unknown status. We applied phylogenetic diversity, differential abundance, and network analyses, as well as clustering, to the integrated dataset to compare microbiological community profiles among the different disease states.ResultsWe found alpha-diversity, i.e., mean species diversity in sites or habitats at a local scale, to be the single strongest predictor of subjects’ periodontitis status (P < 0.011). More specifically, healthy subjects had the highest alpha-diversity, while subjects with stable sites had the lowest alpha-diversity. From these results, we developed an alpha-diversity logistic model-based naive classifier able to perfectly predict the disease status of the seven subjects with unknown periodontal status (not used in training). Phylogenetic profiling resulted in the discovery of nine marker microbes, and these species are able to differentiate between stable and progressing periodontitis, achieving an accuracy of 94.4%. Finally, we found that the reduction of negatively correlated species is a notable signature of disease progression.ConclusionsOur results consistently show a strong association between the loss of oral microbiota diversity and the progression of periodontitis, suggesting that metagenomics sequencing and phylogenetic profiling are predictive of early periodontitis, leading to potential therapeutic intervention. Our results also support a keystone pathogen-mediated polymicrobial synergy and dysbiosis (PSD) model to explain the etiology of periodontitis. Apart from P. gingivalis, we identified three additional keystone species potentially mediating the progression of periodontitis progression based on pathogenic characteristics similar to those of known keystone pathogens.


Nature Communications | 2017

CRISPR–Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis

GiWon Shin; Susan M. Grimes; Ho-Joon Lee; Billy Lau; Li Charlie Xia; Hanlee P. Ji

Microsatellites are multi-allelic and composed of short tandem repeats (STRs) with individual motifs composed of mononucleotides, dinucleotides or higher including hexamers. Next-generation sequencing approaches and other STR assays rely on a limited number of PCR amplicons, typically in the tens. Here, we demonstrate STR-Seq, a next-generation sequencing technology that analyses over 2,000 STRs in parallel, and provides the accurate genotyping of microsatellites. STR-Seq employs in vitro CRISPR–Cas9-targeted fragmentation to produce specific DNA molecules covering the complete microsatellite sequence. Amplification-free library preparation provides single molecule sequences without unique molecular barcodes. STR-selective primers enable massively parallel, targeted sequencing of large STR sets. Overall, STR-Seq has higher throughput, improved accuracy and provides a greater number of informative haplotypes compared with other microsatellite analysis approaches. With these new features, STR-Seq can identify a 0.1% minor genome fraction in a DNA mixture composed of different, unrelated samples.


Journal of Systems Science & Complexity | 2007

Phase Transition in Sequence Unique Reconstruction

Li Charlie Xia; Chan Zhou

In this paper, sequence unique reconstruction refers to the property that a sequence is uniquely reconstructable from all its K-tuples. We propose and study the phase transition behavior of the probability P(K) of unique reconstruction with regard to tuple size K in random sequences (iid model). Based on Monte Carlo experiments, artificial proteins generated from iid model exhibit a phase transition when P(K) abruptly jumps from a low value phase (e.g. < 0.1) to a high value phase (e.g. > 0.9). With a generalization to any alphabet, we prove that for a random sequence of length L, as L is large enough, P(K) undergoes a sharp phase transition when p ≤ 0.1015 where p = P (two random letters match). Besides, formulas are derived to estimate the transition points, which may be of practical use in sequencing DNA by hybridization. Concluded from our study, most proteins do not deviate greatly from random sequences in the sense of sequence unique reconstruction, while there are some “stubborn” proteins which only become uniquely reconstructable at a very large K and probably have biological implications.


Nucleic Acids Research | 2016

A genome-wide approach for detecting novel insertion-deletion variants of mid-range size

Li Charlie Xia; Sukolsak Sakshuwong; Erik S. Hopmans; John M. Bell; Susan M. Grimes; David Siegmund; Hanlee P. Ji; Nancy R. Zhang

We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (<10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.

Collaboration


Dive into the Li Charlie Xia's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Fengzhu Sun

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Jacob A. Cram

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Jed A. Fuhrman

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nancy R. Zhang

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dongmei Ai

University of Science and Technology Beijing

View shared research outputs
Researchain Logo
Decentralizing Knowledge