Yunpeng Cai
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yunpeng Cai.
Nucleic Acids Research | 2009
Yijun Sun; Yunpeng Cai; Li Liu; Fahong Yu; Michael L. Farrell; William McKendree; William G. Farmerie
Recent metagenomics studies of environmental samples suggested that microbial communities are much more diverse than previously reported, and deep sequencing will significantly increase the estimate of total species diversity. Massively parallel pyrosequencing technology enables ultra-deep sequencing of complex microbial populations rapidly and inexpensively. However, computational methods for analyzing large collections of 16S ribosomal sequences are limited. We proposed a new algorithm, referred to as ESPRIT, which addresses several computational issues with prior methods. We developed two versions of ESPRIT, one for personal computers (PCs) and one for computer clusters (CCs). The PC version is used for small- and medium-scale data sets and can process several tens of thousands of sequences within a few minutes, while the CC version is for large-scale problems and is able to analyze several hundreds of thousands of reads within one day. Large-scale experiments are presented that clearly demonstrate the effectiveness of the newly proposed algorithm. The source code and user guide are freely available at http://www.biotech.ufl.edu/people/sun/esprit.html.
Briefings in Bioinformatics | 2012
Yijun Sun; Yunpeng Cai; Susan M. Huse; Rob Knight; William G. Farmerie; Xiaoyu Wang; Volker Mai
Recent advances in massively parallel sequencing technology have created new opportunities to probe the hidden world of microbes. Taxonomy-independent clustering of the 16S rRNA gene is usually the first step in analyzing microbial communities. Dozens of algorithms have been developed in the last decade, but a comprehensive benchmark study is lacking. Here, we survey algorithms currently used by microbiologists, and compare seven representative methods in a large-scale benchmark study that addresses several issues of concern. A new experimental protocol was developed that allows different algorithms to be compared using the same platform, and several criteria were introduced to facilitate a quantitative evaluation of the clustering performance of each algorithm. We found that existing methods vary widely in their outputs, and that inappropriate use of distance levels for taxonomic assignments likely resulted in substantial overestimates of biodiversity in many studies. The benchmark study identified our recently developed ESPRIT-Tree, a fast implementation of the average linkage-based hierarchical clustering algorithm, as one of the best algorithms available in terms of computational efficiency and clustering accuracy.
Nucleic Acids Research | 2011
Yunpeng Cai; Yijun Sun
Taxonomy-independent analysis plays an essential role in microbial community analysis. Hierarchical clustering is one of the most widely employed approaches to finding operational taxonomic units, the basis for many downstream analyses. Most existing algorithms have quadratic space and computational complexities, and thus can be used only for small or medium-scale problems. We propose a new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work. The basic idea is to partition a sequence space into a set of subspaces using a partition tree constructed using a pseudometric, then recursively refine a clustering structure in these subspaces. The technique relies on new methods for fast closest-pair searching and efficient dynamic insertion and deletion of tree nodes. To avoid exhaustive computation of pairwise distances between clusters, we represent each cluster of sequences as a probabilistic sequence, and define a set of operations to align these probabilistic sequences and compute genetic distances between them. We present analyses of space and computational complexity, and demonstrate the effectiveness of our new algorithm using a human gut microbiota data set with over one million sequences. The new algorithm exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.
Molecular Ecology | 2013
Drion G. Boucias; Yunpeng Cai; Yijun Sun; Verena-Ulrike Lietze; Ruchira Sen; Rhitoban Raychoudhury; Michael E. Scharf
Reticulitermes flavipes (Isoptera: Rhinotermitidae) is a highly eusocial insect that thrives on recalcitrant lignocellulosic diets through nutritional symbioses with gut‐dwelling prokaryotes and eukaryotes. In the R. flavipes hindgut, there are up to 12 eukaryotic protozoan symbionts; the number of prokaryotic symbionts has been estimated in the hundreds. Despite its biological relevance, this diverse community, to date, has been investigated only by culture‐ and cloning‐dependent methods. Moreover, it is unclear how termite gut microbiomes respond to diet changes and what roles they play in lignocellulose digestion. This study utilized high‐throughput 454 pyrosequencing of 16S V5‐V6 amplicons to sample the hindgut lumen prokaryotic microbiota of R. flavipes and to examine compositional changes in response to lignin‐rich and lignin‐poor cellulose diets after a 7‐day feeding period. Of the ~475 000 high‐quality reads that were obtained, 99.9% were annotated as bacteria and 0.11% as archaea. Major bacterial phyla included Spirochaetes (24.9%), Elusimicrobia (19.8%), Firmicutes (17.8%), Bacteroidetes (14.1%), Proteobacteria (11.4%), Fibrobacteres (5.8%), Verrucomicrobia (2.0%), Actinobacteria (1.4%) and Tenericutes (1.3%). The R. flavipes hindgut lumen prokaryotic microbiota was found to contain over 4761 species‐level phylotypes. However, diet‐dependent shifts were not statistically significant or uniform across colonies, suggesting significant environmental and/or host genetic impacts on colony‐level microbiome composition. These results provide insights into termite gut microbiome diversity and suggest that (i) the prokaryotic gut microbiota is much more complex than previously estimated, and (ii) environment, founding reproductive pair effects and/or host genetics influence microbiome composition.
Cancer Epidemiology, Biomarkers & Prevention | 2012
Virginia Urquidi; Steve Goodison; Yunpeng Cai; Yijun Sun; Charles J. Rosser
Background: Bladder cancer is among the five most common malignancies worldwide, and due to high rates of recurrence, one of the most prevalent. Improvements in noninvasive urine-based assays to detect bladder cancer would benefit both patients and health care systems. In this study, the goal was to identify urothelial cell transcriptomic signatures associated with bladder cancer. Methods: Gene expression profiling (Affymetrix U133 Plus 2.0 arrays) was applied to exfoliated urothelia obtained from a cohort of 92 subjects with known bladder disease status. Computational analyses identified candidate biomarkers of bladder cancer and an optimal predictive model was derived. Selected targets from the profiling analyses were monitored in an independent cohort of 81 subjects using quantitative real-time PCR (RT-PCR). Results: Transcriptome profiling data analysis identified 52 genes associated with bladder cancer (P ≤ 0.001) and gene models that optimally predicted class label were derived. RT-PCR analysis of 48 selected targets in an independent cohort identified a 14-gene diagnostic signature that predicted the presence of bladder cancer with high accuracy. Conclusions: Exfoliated urothelia sampling provides a robust analyte for the evaluation of patients with suspected bladder cancer. The refinement and validation of the multigene urothelial cell signatures identified in this preliminary study may lead to accurate, noninvasive assays for the detection of bladder cancer. Impact: The development of an accurate, noninvasive bladder cancer detection assay would benefit both the patient and health care systems through better detection, monitoring, and control of disease. Cancer Epidemiol Biomarkers Prev; 21(12); 2149–58. ©2012 AACR.
Insect Molecular Biology | 2013
Rhitoban Raychoudhury; Ruchira Sen; Yunpeng Cai; Yijun Sun; Verena-Ulrike Lietze; Drion G. Boucias; Michael E. Scharf
Termites are highly eusocial insects that thrive on recalcitrant materials like wood and soil and thus play important roles in global carbon recycling and also in damaging wooden structures. Termites, such as Reticulitermes flavipes (Rhinotermitidae), owe their success to their ability to extract nutrients from lignocellulose (a major component of wood) with the help of gut‐dwelling symbionts. With the aim to gain new insights into this enzymatic process we provided R. flavipes with a complex lignocellulose (wood) or pure cellulose (paper) diet and followed the resulting differential gene expression on a custom oligonucleotide‐microarray platform. We identified a set of expressed sequence tags (ESTs) with differential abundance between the two diet treatments and demonstrated the source (host/symbiont) of these genes, providing novel information on termite nutritional symbiosis. Our results reveal: (1) the majority of responsive wood‐ and paper‐abundant ESTs are from host and symbionts, respectively; (2) distinct pathways are associated with lignocellulose and cellulose feeding in both host and symbionts; and (3) sets of diet‐responsive ESTs encode putative digestive and wood‐related detoxification enzymes. Thus, this study illuminates the dynamics of termite nutritional symbiosis and reveals a pool of genes as potential targets for termite control and functional studies of termite‐symbiont interactions.
Nucleic Acids Research | 2010
Yijun Sun; Yunpeng Cai; Volker Mai; William G. Farmerie; Fahong Yu; Jian-Yong Li; Steve Goodison
With the aid of next-generation sequencing technology, researchers can now obtain millions of microbial signature sequences for diverse applications ranging from human epidemiological studies to global ocean surveys. The development of advanced computational strategies to maximally extract pertinent information from massive nucleotide data has become a major focus of the bioinformatics community. Here, we describe a novel analytical strategy including discriminant and topology analyses that enables researchers to deeply investigate the hidden world of microbial communities, far beyond basic microbial diversity estimation. We demonstrate the utility of our approach through a computational study performed on a previously published massive human gut 16S rRNA data set. The application of discriminant and topology analyses enabled us to derive quantitative disease-associated microbial signatures and describe microbial community structure in far more detail than previously achievable. Our approach provides rigorous statistical tools for sequence-based studies aimed at elucidating associations between known or unknown organisms and a variety of physiological or environmental conditions.
genetic and evolutionary computation conference | 2007
Yunpeng Cai; Xiaomin Sun; Hua Xu; Peifa Jia
This paper deals with the adaptive variance scaling issue incontinuous Estimation of Distribution Algorithms. A phenomenon is discovered that current adaptive variance scaling method in EDA suffers from imprecise structure learning. A new type of adaptation method is proposed to overcome this defect. The method tries to measure the difference between the obtained population and the prediction of the probabilistic model, then calculate the scaling factor by minimizing the cross entropy between these two distributions. This approach calculates the scaling factor immediately rather than adapts it incrementally. Experiments show that this approach extended the class of problems that can be solved, and improve the search efficiency in some cases. Moreover, the proposed approach features in that each decomposed subspace can be assigned an individual scaling factor, which helps to solve problems with special dimension property.
robot soccer world cup | 2002
Jinyi Yao; Jiang Chen; Yunpeng Cai; Shi Li
RoboCup Simulation Server provides a wonderful challenge for all the participants. This paper explains key technology implemented by Tsinghuaeolus RoboCup team played in RoboCup environment, including basic adversarial skills, which is developed by using Dynamic Programming in combination with heuristic search algorithm, and reactive strategy architecture. Tsinghuaeolus was the winner of RoboCup2001.
BMC Genomics | 2013
Ruchira Sen; Rhitoban Raychoudhury; Yunpeng Cai; Yijun Sun; Verena-Ulrike Lietze; Drion G. Boucias; Michael E. Scharf
BackgroundTermites are highly eusocial insects and show a division of labor whereby morphologically distinct individuals specialize in distinct tasks. In the lower termite Reticulitermes flavipes (Rhinotermitidae), non-reproducing individuals form the worker and soldier castes, which specialize in helping (e.g., brood care, cleaning, foraging) and defense behaviors, respectively. Workers are totipotent juveniles that can either undergo status quo molts or develop into soldiers or neotenic reproductives. This caste differentiation can be regulated by juvenile hormone (JH) and primer pheromones contained in soldier head extracts (SHE). Here we offered worker termites a cellulose diet treated with JH or SHE for 24-hr, or held them with live soldiers (LS) or live neotenic reproductives (LR). We then determined gene expression profiles of the host termite gut and protozoan symbionts concurrently using custom cDNA oligo-microarrays containing 10,990 individual ESTs.ResultsJH was the most influential treatment (501 total ESTs affected), followed by LS (24 ESTs), LR (12 ESTs) and SHE treatments (6 ESTs). The majority of JH up- and downregulated ESTs were of host and symbiont origin, respectively; in contrast, SHE, LR and LS treatments had more uniform impacts on host and symbiont gene expression. Repeat “follow-up” bioassays investigating combined JH + SHE impacts in relation to individual JH and SHE treatments on a subset of array-positive genes revealed (i) JH and SHE treatments had opposite impacts on gene expression and (ii) JH + SHE impacts on gene expression were generally intermediate between JH and SHE.ConclusionsOur results show that JH impacts hundreds of termite and symbiont genes within 24-hr, strongly suggesting a role for the termite gut in JH-dependent caste determination. Additionally, differential impacts of SHE and LS treatments were observed that are in strong agreement with previous studies that specifically investigated soldier caste regulation. However, it is likely that gene expression outside the gut may be of equal or greater importance than gut gene expression.