Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chaochun Wei is active.

Publication


Featured researches published by Chaochun Wei.


PLOS Biology | 2003

The genome sequence of Caenorhabditis briggsae: A platform for comparative genomics

Lincoln Stein; Zhirong Bao; Darin Blasiar; Thomas Blumenthal; Michael R. Brent; Nansheng Chen; Asif T. Chinwalla; Laura Clarke; Chris Clee; Avril Coghlan; Alan Coulson; Peter D'Eustachio; David H. A. Fitch; Lucinda A. Fulton; Robert Fulton; Sam Griffiths-Jones; Todd W. Harris; LaDeana W. Hillier; Ravi S. Kamath; Patricia E. Kuwabara; Elaine R. Mardis; Marco A. Marra; Tracie L. Miner; Patrick Minx; James C. Mullikin; Robert W. Plumb; Jane Rogers; Jacqueline E. Schein; Marc Sohrmann; John Spieth

The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs) known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp) and C. elegans (100.3 Mbp) genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C. briggsae, we found strong evidence for 1,300 new C. elegans genes. In addition, comparisons of the two genomes will help to understand the evolutionary forces that mold nematode genomes.


The ISME Journal | 2010

Interactions between gut microbiota, host genetics and diet relevant to development of metabolic syndromes in mice.

Chenhong Zhang; Menghui Zhang; Wang S; Ruijun Han; Youfang Cao; Weiying Hua; Yuejian Mao; Xiaojun Zhang; Xiaoyan Pang; Chaochun Wei; Guoping Zhao; Yan(陈雁) Chen; Liping Zhao

Both genetic variations and diet-disrupted gut microbiota can predispose animals to metabolic syndromes (MS). This study assessed the relative contributions of host genetics and diet in shaping the gut microbiota and modulating MS-relevant phenotypes in mice. Together with its wild-type (Wt) counterpart, the Apoa-I knockout mouse, which has impaired glucose tolerance (IGT) and increased body fat, was fed a high-fat diet (HFD) or normal chow (NC) diet for 25 weeks. DNA fingerprinting and bar-coded pyrosequencing of 16S rRNA genes were used to profile gut microbiota structures and to identify the key population changes relevant to MS development by Partial Least Square Discriminate Analysis. Diet changes explained 57% of the total structural variation in gut microbiota, whereas genetic mutation accounted for no more than 12%. All three groups with IGT had significantly different gut microbiota relative to healthy Wt/NC-fed animals. In all, 65 species-level phylotypes were identified as key members with differential responses to changes in diet, genotype and MS phenotype. Most notably, gut barrier-protecting Bifidobacterium spp. were nearly absent in all animals on HFD, regardless of genotype. Sulphate-reducing, endotoxin-producing bacteria of the family, Desulfovibrionaceae, were enhanced in all animals with IGT, most significantly in the Wt/HFD group, which had the highest calorie intake and the most serious MS phenotypes. Thus, diet has a dominating role in shaping gut microbiota and changes of some key populations may transform the gut microbiota of Wt animals into a pathogen-like entity relevant to development of MS, despite a complete host genome.


Bioinformatics | 2008

ITFP: an integrated platform of mammalian transcription factors

Guangyong Zheng; Kang Tu; Qing Yang; Yun Xiong; Chaochun Wei; Lu Xie; Yangyong Zhu; Yixue Li

Investigation of transcription factors (TFs) and their downstream regulated genes (targets) is a significant issue in post-genome era, which can provide a brand new vision for some vital biological process. However, information of TFs and their targets in mammalian is far from sufficient. Here, we developed an integrated TF platform (ITFP), which included abundant TFs and their targets of mammalian. In current release, ITFP includes 4105 putative TFs and 69 496 potential TF-target pairs for human, 3134 putative TFs and 37 040 potential TF-target pairs for mouse, and 1114 putative TFs and 18 055 potential TF-target pairs for rat. In short, ITFP will serve as an important resource for the research community of transcription and provide strong support for regulatory network study.


Genome Research | 2014

Whole-genome sequencing of six dog breeds from continuous altitudes reveals adaptation to high-altitude hypoxia

Xiao Gou; Zhen Wang; Ning Li; Feng Qiu; Ze Xu; Dawei Yan; Shuli Yang; Jia Jia; Xiaoyan Kong; Zehui Wei; Shaoxiong Lu; Linsheng Lian; Changxin Wu; Xueyan Wang; Guozhi Li; Teng Ma; Qiang Jiang; Xue Zhao; Jiaqiang Yang; Baohong Liu; Dongkai Wei; Hong Li; Jianfa Yang; Yulin Yan; Guiying Zhao; Xingxing Dong; Mingli Li; Weidong Deng; Jing Leng; Chaochun Wei

The hypoxic environment imposes severe selective pressure on species living at high altitude. To understand the genetic bases of adaptation to high altitude in dogs, we performed whole-genome sequencing of 60 dogs including five breeds living at continuous altitudes along the Tibetan Plateau from 800 to 5100 m as well as one European breed. More than 150× sequencing coverage for each breed provides us with a comprehensive assessment of the genetic polymorphisms of the dogs, including Tibetan Mastiffs. Comparison of the breeds from different altitudes reveals strong signals of population differentiation at the locus of hypoxia-related genes including endothelial Per-Arnt-Sim (PAS) domain protein 1 (EPAS1) and beta hemoglobin cluster. Notably, four novel nonsynonymous mutations specific to high-altitude dogs are identified at EPAS1, one of which occurred at a quite conserved site in the PAS domain. The association testing between EPAS1 genotypes and blood-related phenotypes on additional high-altitude dogs reveals that the homozygous mutation is associated with decreased blood flow resistance, which may help to improve hemorheologic fitness. Interestingly, EPAS1 was also identified as a selective target in Tibetan highlanders, though no amino acid changes were found. Thus, our results not only indicate parallel evolution of humans and dogs in adaptation to high-altitude hypoxia, but also provide a new opportunity to study the role of EPAS1 in the adaptive processes.


PLOS ONE | 2009

More than 9,000,000 Unique Genes in Human Gut Bacterial Community: Estimating Gene Numbers Inside a Human Body

Xing Yang; Lu Xie; Yixue Li; Chaochun Wei

Background Estimating the number of genes in human genome has been long an important problem in computational biology. With the new conception of considering human as a super-organism, it is also interesting to estimate the number of genes in this human super-organism. Principal Findings We presented our estimation of gene numbers in the human gut bacterial community, the largest microbial community inside the human super-organism. We got 552,700 unique genes from 202 complete human gut bacteria genomes. Then, a novel gene counting model was built to check the total number of genes by combining culture-independent sequence data and those complete genomes. 16S rRNAs were used to construct a three-level tree and different counting methods were introduced for the three levels: strain-to-species, species-to-genus, and genus-and-up. The model estimates that the total number of genes is about 9,000,000 after those with identity percentage of 97% or up were merged. Conclusion By combining completed genomes currently available and culture-independent sequencing data, we built a model to estimate the number of genes in human gut bacterial community. The total number of genes is estimated to be about 9 million. Although this number is huge, we believe it is underestimated. This is an initial step to tackle this gene counting problem for the human super-organism. It will still be an open problem in the near future. The list of genomes used in this paper can be found in the supplementary table.


PLOS ONE | 2013

NeSSM: A Next-Generation Sequencing Simulator for Metagenomics

Ben Jia; Liming Xuan; Kaiye Cai; Zhiqiang Hu; Liangxiao Ma; Chaochun Wei

Background Metagenomics can reveal the vast majority of microbes that have been missed by traditional cultivation-based methods. Due to its extremely wide range of application areas, fast metagenome sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of metagenomics analysis tools. Results We present here a customizable metagenome simulation system: NeSSM (Next-generation Sequencing Simulator for Metagenomics). Combining complete genomes currently available, a community composition table, and sequencing parameters, it can simulate metagenome sequencing better than existing systems. Sequencing error models based on the explicit distribution of errors at each base and sequencing coverage bias are incorporated in the simulation. In order to improve the fidelity of simulation, tools are provided by NeSSM to estimate the sequencing error models, sequencing coverage bias and the community composition directly from existing metagenome sequencing data. Currently, NeSSM supports single-end and pair-end sequencing for both 454 and Illumina platforms. In addition, a GPU (graphics processing units) version of NeSSM is also developed to accelerate the simulation. By comparing the simulated sequencing data from NeSSM with experimental metagenome sequencing data, we have demonstrated that NeSSM performs better in many aspects than existing popular metagenome simulators, such as MetaSim, GemSIM and Grinder. The GPU version of NeSSM is more than one-order of magnitude faster than MetaSim. Conclusions NeSSM is a fast simulation system for high-throughput metagenome sequencing. It can be helpful to develop tools and evaluate strategies for metagenomics analysis and it’s freely available for academic users at http://cbb.sjtu.edu.cn/~ccwei/pub/software/NeSSM.php.


PLOS ONE | 2014

LAceP: Lysine Acetylation Site Prediction Using Logistic Regression Classifiers

Ting Hou; Guangyong Zheng; Pingyu Zhang; Jia Jia; Jing Li; Lu Xie; Chaochun Wei; Yixue Li

Background Lysine acetylation is a crucial type of protein post-translational modification, which is involved in many important cellular processes and serious diseases. However, identification of protein acetylated sites through traditional experiment methods is time-consuming and laborious. Those methods are not suitable to identify a large number of acetylated sites quickly. Therefore, computational methods are still very valuable to accelerate lysine acetylated site finding. Result In this study, many biological characteristics of acetylated sites have been investigated, such as the amino acid sequence around the acetylated sites, the physicochemical property of the amino acids and the transition probability of adjacent amino acids. A logistic regression method was then utilized to integrate these information for generating a novel lysine acetylation prediction system named LAceP. When compared with existing methods, LAceP overwhelms most of state-of-the-art methods. Especially, LAceP has a more balanced prediction capability for positive and negative datasets. Conclusion LAceP can integrate different biological features to predict lysine acetylation with high accuracy. An online web server is freely available at http://www.scbit.org/iPTM/.


BMC Bioinformatics | 2008

The combination approach of SVM and ECOC for powerful identification and classification of transcription factor

Guangyong Zheng; Ziliang Qian; Qing Yang; Chaochun Wei; Lu Xie; Yangyong Zhu; Yixue Li

BackgroundTranscription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and reduce costs, many computational methods have been developed to identify TFs from new proteins and to classify the resulted TFs. Though these methods have facilitated screening of TFs to some extent, low accuracy is still a common problem. With the fast growing number of new proteins, more precise algorithms for identifying TFs from new proteins and classifying the consequent TFs are in a high demand.ResultsThe support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Error-correcting output coding (ECOC) algorithm, which was originated from information and communication engineering fields, was introduced to combine with support vector machine (SVM) methodology for TF classification. The overall success rates of identification and classification achieved 88.22% and 97.83% respectively. Finally, a web site was constructed to let users access our tools (see Availability and requirements section for URL).ConclusionThe SVM method was a valid and stable means for TFs identification with protein domains and functional sites as feature vectors. Error-correcting output coding (ECOC) algorithm is a powerful method for multi-class classification problem. When combined with SVM method, it can remarkably increase the accuracy of TF classification using protein domains and functional sites as feature vectors. In addition, our work implied that ECOC algorithm may succeed in a broad range of applications in biological data mining.


Nature | 2018

Genomic variation in 3,010 diverse accessions of Asian cultivated rice

W.Y. Wang; Ramil Mauleon; Zhiqiang Hu; Dmytro Chebotarov; Shuaishuai Tai; Zhichao Wu; Min Li; Tianqing Zheng; Roven Rommel Fuentes; Fan Zhang; Locedie Mansueto; Dario Copetti; Millicent Sanciangco; Kevin Palis; Jianlong Xu; Chen Sun; Binying Fu; Hongliang Zhang; Yongming Gao; Xiuqin Zhao; Fei Shen; Xiao Cui; Hong Yu; Zichao Li; Miaolin Chen; Jeffrey Detras; Yongli Zhou; Xinyuan Zhang; Yue Zhao; Dave Kudrna

Here we analyse genetic variation, population structure and diversity among 3,010 diverse Asian cultivated rice (Oryza sativa L.) genomes from the 3,000 Rice Genomes Project. Our results are consistent with the five major groups previously recognized, but also suggest several unreported subpopulations that correlate with geographic location. We identified 29 million single nucleotide polymorphisms, 2.4 million small indels and over 90,000 structural variations that contribute to within- and between-population variation. Using pan-genome analyses, we identified more than 10,000 novel full-length protein-coding genes and a high number of presence–absence variations. The complex patterns of introgression observed in domestication genes are consistent with multiple independent rice domestication events. The public availability of data from the 3,000 Rice Genomes Project provides a resource for rice genomics research and breeding.Analyses of genetic variation and population structure based on over 3,000 cultivated rice (Oryza sativa) genomes reveal subpopulations that correlate with geographic location and patterns of introgression consistent with multiple rice domestication events.


BMC Plant Biology | 2012

PMRD: a curated database for genes and mutants involved in plant male reproduction

Xiao Cui; Qiudao Wang; Wenzhe Yin; Huayong Xu; Zoe A. Wilson; Chaochun Wei; Shenyuan Pan; Dabing Zhang

BackgroundMale reproduction is an essential biological event in the plant life cycle separating the diploid sporophyte and haploid gametophyte generations, which involves expression of approximately 20,000 genes. The control of male reproduction is also of economic importance for plant breeding and hybrid seed production. With the advent of forward and reverse genetics and genomic technologies, a large number of male reproduction-related genes have been identified. Thus it is extremely challenging for individual researchers to systematically collect, and continually update, all the available information on genes and mutants related to plant male reproduction. The aim of this study is to manually curate such gene and mutant information and provide a web-accessible resource to facilitate the effective study of plant male reproduction.DescriptionPlant Male Reproduction Database (PMRD) is a comprehensive resource for browsing and retrieving knowledge on genes and mutants related to plant male reproduction. It is based upon literature and biological databases and includes 506 male sterile genes and 484 mutants with defects of male reproduction from a variety of plant species. Based on Gene Ontology (GO) annotations and literature, information relating to a further 3697 male reproduction related genes were systematically collected and included, and using in text curation, gene expression and phenotypic information were captured from the literature. PMRD provides a web interface which allows users to easily access the curated annotations and genomic information, including full names, symbols, locations, sequences, expression patterns, functions of genes, mutant phenotypes, male sterile categories, and corresponding publications. PMRD also provides mini tools to search and browse expression patterns of genes in microarray datasets, run BLAST searches, convert gene ID and generate gene networks. In addition, a Mediawiki engine and a forum have been integrated within the database, allowing users to share their knowledge, make comments and discuss topics.ConclusionPMRD provides an integrated link between genetic studies and the rapidly growing genomic information. As such this database provides a global view of plant male reproduction and thus aids advances in this important area.

Collaboration


Dive into the Chaochun Wei's collaboration.

Top Co-Authors

Avatar

Yixue Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Zhiqiang Hu

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar

Guangyong Zheng

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Chen Sun

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar

Lu Xie

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Kang Tu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Michael R. Brent

Washington University in St. Louis

View shared research outputs
Top Co-Authors

Avatar

Huayong Xu

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar

Jianxin Shi

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar

Jinyuan Lu

Shanghai Jiao Tong University

View shared research outputs
Researchain Logo
Decentralizing Knowledge