Chuming Chen
University of Delaware
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chuming Chen.
Source Code for Biology and Medicine | 2014
Chuming Chen; Sari Khaleel; Hongzhan Huang; Cathy H. Wu
BackgroundWhen compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets.MethodsWe developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7.ResultsSeveral combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference-based assembly as measured by assembly contiguity and correctness.ConclusionsTrimming of short read sequences can improve the quality of de novo and reference-based assembly and assembler performance. The parallel processing capability of ngsShoRT reduces trimming time and improves the memory efficiency when dealing with large datasets. We recommend combining sequencing artifacts removal, and quality score based read filtering and base trimming as the most consistent method for improving sequence quality and downstream assemblies.ngsShoRT source code, user guide and tutorial are available at http://research.bioinformatics.udel.edu/genomics/ngsShoRT/. ngsShoRT can be incorporated as a pre-processing step in genome and transcriptome assembly projects.
PLOS ONE | 2011
Chuming Chen; Darren A. Natale; Robert D. Finn; Hongzhan Huang; Jian Zhang; Cathy H. Wu; Raja Mazumder
The accelerating growth in the number of protein sequences taxes both the computational and manual resources needed to analyze them. One approach to dealing with this problem is to minimize the number of proteins subjected to such analysis in a way that minimizes loss of information. To this end we have developed a set of Representative Proteomes (RPs), each selected from a Representative Proteome Group (RPG) containing similar proteomes calculated based on co-membership in UniRef50 clusters. A Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold (CMT) are provided to allow users to decrease or increase the granularity of the sequence space based on their requirements. We find that a CMT of 55% (RP55) most closely follows standard taxonomic classifications. Further analysis of this set reveals that sequence space is reduced by more than 80% relative to UniProtKB, while retaining both sequence diversity (over 95% of InterPro domains) and annotation information (93% of experimentally characterized proteins). All sets can be browsed and are available for sequence similarity searches and download at http://www.proteininformationresource.org/rps, while the set of 637 RPs determined using a 55% CMT are also available for text searches. Potential applications include sequence similarity searches, protein classification and targeted protein annotation and characterization.
Human Molecular Genetics | 2014
Daniel R. Crooks; Thanemozhi G. Natarajan; Suh Young Jeong; Chuming Chen; Sun Young Park; Hongzhan Huang; Manik C. Ghosh; Wing Hang Tong; Ronald G. Haller; Cathy H. Wu; Tracey A. Rouault
Iron-sulfur (Fe-S) clusters are ancient enzyme cofactors found in virtually all life forms. We evaluated the physiological effects of chronic Fe-S cluster deficiency in human skeletal muscle, a tissue that relies heavily on Fe-S cluster-mediated aerobic energy metabolism. Despite greatly decreased oxidative capacity, muscle tissue from patients deficient in the Fe-S cluster scaffold protein ISCU showed a predominance of type I oxidative muscle fibers and higher capillary density, enhanced expression of transcriptional co-activator PGC-1α and increased mitochondrial fatty acid oxidation genes. These Fe-S cluster-deficient muscles showed a dramatic up-regulation of the ketogenic enzyme HMGCS2 and the secreted protein FGF21 (fibroblast growth factor 21). Enhanced muscle FGF21 expression was reflected by elevated circulating FGF21 levels in the patients, and robust FGF21 secretion could be recapitulated by respiratory chain inhibition in cultured myotubes. Our findings reveal that mitochondrial energy starvation elicits a coordinated response in Fe-S-deficient skeletal muscle that is reflected systemically by increased plasma FGF21 levels.
BMC Genomics | 2005
David McKillen; Yian A Chen; Chuming Chen; Matthew J. Jenny; Harold F. Trent; Javier Robalino; David C. McLean; Paul S. Gross; Robert W. Chapman; Gregory W. Warr; Jonas S. Almeida
BackgroundThe Marine Genomics project is a functional genomics initiative developed to provide a pipeline for the curation of Expressed Sequence Tags (ESTs) and gene expression microarray data for marine organisms. It provides a unique clearing-house for marine specific EST and microarray data and is currently available at http://www.marinegenomics.org.DescriptionThe Marine Genomics pipeline automates the processing, maintenance, storage and analysis of EST and microarray data for an increasing number of marine species. It currently contains 19 species databases (over 46,000 EST sequences) that are maintained by registered users from local and remote locations in Europe and South America in addition to the USA. A collection of analysis tools are implemented. These include a pipeline upload tool for EST FASTA file, sequence trace file and microarray data, an annotative text search, automated sequence trimming, sequence quality control (QA/QC) editing, sequence BLAST capabilities and a tool for interactive submission to GenBank. Another feature of this resource is the integration with a scientific computing analysis environment implemented by MATLAB.ConclusionThe conglomeration of multiple marine organisms with integrated analysis tools enables users to focus on the comprehensive descriptions of transcriptomic responses to typical marine stresses. This cross species data comparison and integration enables users to contain their research within a marine-oriented data management and analysis environment.
Database | 2012
Qinghua Wang; Cecilia N. Arighi; Benjamin L. King; Shawn W. Polson; James Vincent; Chuming Chen; Hongzhan Huang; Brewster F. Kingham; Shallee T. Page; Marc Farnum Rendino; William Kelley Thomas; Daniel W. Udwary; Cathy H. Wu
Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.
International Geology Review | 1994
Yangshen Shi; Huafu Lu; Dong Jia; Dongsheng Cai; Shimin Wu; Chuming Chen; Howell DavidG.; Valin ZenonC.
The plate-tectonic evolution of the Tarim basin and nearby western Tianshan region during Paleozoic time is reconstructed in an effort to further constrain the tectonic evolution of Central Asia, providing insights into the formation and distribution of oil and gas resources. The Tarim plate developed from continental rifting that progressed during early Paleozoic time into a passive continental margin. The Yili terrane (central Tianshan) broke away from the present eastern part of Tarim and became a microcontinent located somewhere between the Junggar ocean and the southern Tianshan ocean. The southern Tianshan ocean, between the Tarim craton and the Yili terrane, was subducting beneath the Yili terrane from Silurian to Devonian time. During the Late Devonian-Early Carboniferous, the Tarim plate collided with the Yili terrane by sinistral accretional docking that resulted in a late Paleozoic deformational episode. Intracontinental shortening (A-type subduction) continued through the Permian with the crea...
F1000Research | 2014
Jennifer T. Wyffels; Benjamin L. King; James J. Vincent; Chuming Chen; Cathy H. Wu; Shawn W. Polson
Chondrichthyan fishes are a diverse class of gnathostomes that provide a valuable perspective on fundamental characteristics shared by all jawed and limbed vertebrates. Studies of phylogeny, species diversity, population structure, conservation, and physiology are accelerated by genomic, transcriptomic and protein sequence data. These data are widely available for many sarcopterygii (coelacanth, lungfish and tetrapods) and actinoptergii (ray-finned fish including teleosts) taxa, but limited for chondrichthyan fishes. In this study, we summarize available data for chondrichthyes and describe resources for one of the largest projects to characterize one of these fish, Leucoraja erinacea, the little skate. SkateBase ( http://skatebase.org) serves as the skate genome project portal linking data, research tools, and teaching resources.
Bioinformatics | 2013
Chuming Chen; Zhiwen Li; Hongzhan Huang; Baris E. Suzek; Cathy H. Wu
SUMMARY We have developed a new web application for peptide matching using Apache Lucene-based search engine. The Peptide Match service is designed to quickly retrieve all occurrences of a given query peptide from UniProt Knowledgebase (UniProtKB) with isoforms. The matched proteins are shown in summary tables with rich annotations, including matched sequence region(s) and links to corresponding proteins in a number of proteomic/peptide spectral databases. The results are grouped by taxonomy and can be browsed by organism, taxonomic group or taxonomy tree. The service supports queries where isobaric leucine and isoleucine are treated equivalent, and an option for searching UniRef100 representative sequences, as well as dynamic queries to major proteomic databases. In addition to the web interface, we also provide RESTful web services. The underlying data are updated every 4 weeks in accordance with the UniProt releases. AVAILABILITY http://proteininformationresource.org/peptide.shtml. CONTACT [email protected]. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Methods of Molecular Biology | 2011
Chuming Chen; Hongzhan Huang; Cathy H. Wu
In the past decades, a variety of publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. However, there is also an increasing confusion for the researchers who are trying to quickly find the appropriate resources to help them solve their problems. In this chapter, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases and resources that are relevant to comparative proteomics research. We conclude the chapter by discussing the challenges and opportunities for developing new protein bioinformatics databases.
PLOS ONE | 2015
Christopher W. Resnyk; Chuming Chen; Hongzhan Huang; Cathy H. Wu; Jean Simon; Elisabeth Le Bihan-Duval; M. J. Duclos; Larry A. Cogburn
Genetic selection for enhanced growth rate in meat-type chickens (Gallus domesticus) is usually accompanied by excessive adiposity, which has negative impacts on both feed efficiency and carcass quality. Enhanced visceral fatness and several unique features of avian metabolism (i.e., fasting hyperglycemia and insulin insensitivity) mimic overt symptoms of obesity and related metabolic disorders in humans. Elucidation of the genetic and endocrine factors that contribute to excessive visceral fatness in chickens could also advance our understanding of human metabolic diseases. Here, RNA sequencing was used to examine differential gene expression in abdominal fat of genetically fat and lean chickens, which exhibit a 2.8-fold divergence in visceral fatness at 7 wk. Ingenuity Pathway Analysis revealed that many of 1687 differentially expressed genes are associated with hemostasis, endocrine function and metabolic syndrome in mammals. Among the highest expressed genes in abdominal fat, across both genotypes, were 25 differentially expressed genes associated with de novo synthesis and metabolism of lipids. Over-expression of numerous adipogenic and lipogenic genes in the FL chickens suggests that in situ lipogenesis in chickens could make a more substantial contribution to expansion of visceral fat mass than previously recognized. Distinguishing features of the abdominal fat transcriptome in lean chickens were high abundance of multiple hemostatic and vasoactive factors, transporters, and ectopic expression of several hormones/receptors, which could control local vasomotor tone and proteolytic processing of adipokines, hemostatic factors and novel endocrine factors. Over-expression of several thrombogenic genes in abdominal fat of lean chickens is quite opposite to the pro-thrombotic state found in obese humans. Clearly, divergent genetic selection for an extreme (2.5–2.8-fold) difference in visceral fatness provokes a number of novel regulatory responses that govern growth and metabolism of visceral fat in this unique avian model of juvenile-onset obesity and glucose-insulin imbalance.