Fangfang Xia | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fangfang Xia is active.

Explore More

Publication

Featured researches published by Fangfang Xia.

Nucleic Acids Research | 2014

The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)

Ross Overbeek; Robert Olson; Gordon D. Pusch; Gary J. Olsen; James J. Davis; Terry Disz; Robert Edwards; Svetlana Gerdes; Bruce Parrello; Maulik Shukla; Veronika Vonstein; Alice R. Wattam; Fangfang Xia; Rick Stevens

In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.

Genome Research | 2011

Assemblathon 1: A competitive assessment of de novo short read assembly methods

Dent Earl; Keith Bradnam; John St. John; Aaron E. Darling; Dawei Lin; Joseph Fass; Hung On Ken Yu; Vince Buffalo; Daniel R. Zerbino; Mark Diekhans; Ngan Nguyen; Pramila Ariyaratne; Wing-Kin Sung; Zemin Ning; Matthias Haimel; Jared T. Simpson; Nuno A. Fonseca; Inanc Birol; T. Roderick Docking; Isaac Ho; Daniel S. Rokhsar; Rayan Chikhi; Dominique Lavenier; Guillaume Chapuis; Delphine Naquin; Nicolas Maillet; Michael C. Schatz; David R. Kelley; Adam M. Phillippy; Sergey Koren

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.

Scientific Reports | 2015

RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

Thomas Brettin; James J. Davis; Terry Disz; Robert Edwards; Svetlana Gerdes; Gary J. Olsen; Robert Olson; Ross Overbeek; Bruce Parrello; Gordon D. Pusch; Maulik Shukla; James Thomason; Rick Stevens; Veronika Vonstein; Alice R. Wattam; Fangfang Xia

The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

Nucleic Acids Research | 2017

Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center

Alice R. Wattam; James J. Davis; Rida Assaf; Sébastien Boisvert; Thomas Brettin; Christopher Bun; Neal Conrad; Emily M. Dietrich; Terry Disz; Joseph L. Gabbard; Svetlana Gerdes; Christopher S. Henry; Ronald Kenyon; Dustin Machi; Chunhong Mao; Eric K. Nordberg; Gary J. Olsen; Daniel Murphy-Olson; Robert Olson; Ross Overbeek; Bruce Parrello; Gordon D. Pusch; Maulik Shukla; Veronika Vonstein; Andrew S. Warren; Fangfang Xia; Hyun Seung Yoo; Rick Stevens

The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by ‘virtual integration’ to any of PATRICs public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.

Nucleic Acids Research | 2007

The National Microbial Pathogen Database Resource (NMPDR): a genomics platform based on subsystem annotation

Leslie K. McNeil; Claudia I. Reich; Ramy K. Aziz; Daniela Bartels; Matthew Cohoon; Terry Disz; Robert Edwards; Svetlana Gerdes; Kaitlyn Hwang; Michael Kubal; Gohar Rem Margaryan; Folker Meyer; William Mihalo; Gary J. Olsen; Robert Olson; Andrei L. Osterman; Daniel Paarmann; Tobias Paczian; Bruce Parrello; Gordon D. Pusch; Dmitry A. Rodionov; Xinghua Shi; Olga Vassieva; Veronika Vonstein; Olga Zagnitko; Fangfang Xia; Jenifer Zinner; Ross Overbeek; Rick Stevens

The National Microbial Pathogen Data Resource (NMPDR) () is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of ∼50 strains of pathogenic bacteria that are the focus of our curators, as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic Domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context. Investigators can browse subsystems and reactions to develop accurate reconstructions of the metabolic networks of any sequenced organism. NMPDR provides a comprehensive bioinformatics platform, with tools and viewers for genome analysis. Results of precomputed gene clustering analyses can be retrieved in tabular or graphic format with one-click tools. NMPDR tools include Signature Genes, which finds the set of genes in common or that differentiates two groups of organisms. Essentiality data collated from genome-wide studies have been curated. Drug target identification and high-throughput, in silico, compound screening are in development.

PLOS ONE | 2012

SEED Servers: High-Performance Access to the SEED Genomes, Annotations, and Metabolic Models

Ramy K. Aziz; Scott Devoid; Terrence Disz; Robert Edwards; Christopher S. Henry; Gary J. Olsen; Robert Olson; Ross Overbeek; Bruce Parrello; Gordon D. Pusch; Rick Stevens; Veronika Vonstein; Fangfang Xia

The remarkable advance in sequencing technology and the rising interest in medical and environmental microbiology, biotechnology, and synthetic biology resulted in a deluge of published microbial genomes. Yet, genome annotation, comparison, and modeling remain a major bottleneck to the translation of sequence information into biological knowledge, hence computational analysis tools are continuously being developed for rapid genome annotation and interpretation. Among the earliest, most comprehensive resources for prokaryotic genome analysis, the SEED project, initiated in 2003 as an integration of genomic data and analysis tools, now contains >5,000 complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, a derived set of protein families, and hundreds of genome-scale metabolic models. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. To allow high-performance remote access to the SEED database, we developed the SEED Servers (http://www.theseed.org/servers): four network-based servers intended to expose the data in the underlying relational database, support basic annotation services, offer programmatic access to the capabilities of the RAST annotation server, and provide access to a growing collection of metabolic models that support flux balance analysis. The SEED servers offer open access to regularly updated data, the ability to annotate prokaryotic genomes, the ability to create metabolic reconstructions and detailed models of metabolism, and access to hundreds of existing metabolic models. This work offers and supports a framework upon which other groups can build independent research efforts. Large integrations of genomic data represent one of the major intellectual resources driving research in biology, and programmatic access to the SEED data will provide significant utility to a broad collection of potential users.

Nucleic Acids Research | 2013

Building the repertoire of dispensable chromosome regions in Bacillus subtilis entails major refinement of cognate large-scale metabolic model

Kosei Tanaka; Christopher S. Henry; Jenifer Zinner; Edmond Jolivet; Matthew Cohoon; Fangfang Xia; Vladimir Bidnenko; S. Dusko Ehrlich; Rick Stevens; Philippe Noirot

The nonessential regions in bacterial chromosomes are ill-defined due to incomplete functional information. Here, we establish a comprehensive repertoire of the genome regions that are dispensable for growth of Bacillus subtilis in a variety of media conditions. In complex medium, we attempted deletion of 157 individual regions ranging in size from 2 to 159 kb. A total of 146 deletions were successful in complex medium, whereas the remaining regions were subdivided to identify new essential genes (4) and coessential gene sets (7). Overall, our repertoire covers ∼76% of the genome. We screened for viability of mutant strains in rich defined medium and glucose minimal media. Experimental observations were compared with predictions by the iBsu1103 model, revealing discrepancies that led to numerous model changes, including the large-scale application of model reconciliation techniques. We ultimately produced the iBsu1103V2 model and generated predictions of metabolites that could restore the growth of unviable strains. These predictions were experimentally tested and demonstrated to be correct for 27 strains, validating the refinements made to the model. The iBsu1103V2 model has improved considerably at predicting loss of viability, and many insights gained from the model revisions have been integrated into the Model SEED to improve reconstruction of other microbial models.

Biochimica et Biophysica Acta | 2011

Connecting genotype to phenotype in the era of high-throughput sequencing.

Christopher S. Henry; Ross Overbeek; Fangfang Xia; Aaron A. Best; Elizabeth M. Glass; Jack A. Gilbert; Peter E. Larsen; Robert Edwards; Terry Disz; Folker Meyer; Veronika Vonstein; Matthew DeJongh; Daniela Bartels; Narayan Desai; Mark D'Souza; Scott Devoid; Kevin P. Keegan; Robert Olson; Andreas Wilke; Jared Wilkening; Rick Stevens

BACKGROUND The development of next generation sequencing technology is rapidly changing the face of the genome annotation and analysis field. One of the primary uses for genome sequence data is to improve our understanding and prediction of phenotypes for microbes and microbial communities, but the technologies for predicting phenotypes must keep pace with the new sequences emerging. SCOPE OF REVIEW This review presents an integrated view of the methods and technologies used in the inference of phenotypes for microbes and microbial communities based on genomic and metagenomic data. Given the breadth of this topic, we place special focus on the resources available within the SEED Project. We discuss the two steps involved in connecting genotype to phenotype: sequence annotation, and phenotype inference, and we highlight the challenges in each of these steps when dealing with both single genome and metagenome data. MAJOR CONCLUSIONS This integrated view of the genotype-to-phenotype problem highlights the importance of a controlled ontology in the annotation of genomic data, as this benefits subsequent phenotype inference and metagenome annotation. We also note the importance of expanding the set of reference genomes to improve the annotation of all sequence data, and we highlight metagenome assembly as a potential new source for complete genomes. Finally, we find that phenotype inference, particularly from metabolic models, generates predictions that can be validated and reconciled to improve annotations. GENERAL SIGNIFICANCE This review presents the first look at the challenges and opportunities associated with the inference of phenotype from genotype during the next generation sequencing revolution. This article is part of a Special Issue entitled: Systems Biology of Microorganisms.

Scientific Reports | 2016

Antimicrobial Resistance Prediction in PATRIC and RAST

James J. Davis; Sébastien Boisvert; Thomas Brettin; Ronald W. Kenyon; Chunhong Mao; Robert J. Olson; Ross Overbeek; John Santerre; Maulik Shukla; Alice R. Wattam; Rebecca Will; Fangfang Xia; Rick Stevens

The emergence and spread of antimicrobial resistance (AMR) mechanisms in bacterial pathogens, coupled with the dwindling number of effective antibiotics, has created a global health crisis. Being able to identify the genetic mechanisms of AMR and predict the resistance phenotypes of bacterial pathogens prior to culturing could inform clinical decision-making and improve reaction time. At PATRIC (http://patricbrc.org/), we have been collecting bacterial genomes with AMR metadata for several years. In order to advance phenotype prediction and the identification of genomic regions relating to AMR, we have updated the PATRIC FTP server to enable access to genomes that are binned by their AMR phenotypes, as well as metadata including minimum inhibitory concentrations. Using this infrastructure, we custom built AdaBoost (adaptive boosting) machine learning classifiers for identifying carbapenem resistance in Acinetobacter baumannii, methicillin resistance in Staphylococcus aureus, and beta-lactam and co-trimoxazole resistance in Streptococcus pneumoniae with accuracies ranging from 88–99%. We also did this for isoniazid, kanamycin, ofloxacin, rifampicin, and streptomycin resistance in Mycobacterium tuberculosis, achieving accuracies ranging from 71–88%. This set of classifiers has been used to provide an initial framework for species-specific AMR phenotype and genomic feature prediction in the RAST and PATRIC annotation services.

bioRxiv | 2016

The DOE Systems Biology Knowledgebase (KBase)

Adam P. Arkin; Rick Stevens; Robert W. Cottingham; Sergei Maslov; Christopher S. Henry; Paramvir Dehal; Doreen Ware; Fernando Perez; Nomi L. Harris; Shane Canon; Michael W Sneddon; Matthew L Henderson; William J Riehl; Dan Gunter; Dan Murphy-Olson; Stephen Chan; Roy T Kamimura; Thomas S Brettin; Folker Meyer; Dylan Chivian; David J. Weston; Elizabeth M. Glass; Brian H. Davison; Sunita Kumari; Benjamin H Allen; Jason K. Baumohl; Aaron A. Best; Ben Bowen; Steven E. Brenner; Christopher C Bun

The U.S. Department of Energy Systems Biology Knowledgebase (KBase) is an open-source software and data platform designed to meet the grand challenge of systems biology — predicting and designing biological function from the biomolecular (small scale) to the ecological (large scale). KBase is available for anyone to use, and enables researchers to collaboratively generate, test, compare, and share hypotheses about biological functions; perform large-scale analyses on scalable computing infrastructure; and combine experimental evidence and conclusions that lead to accurate models of plant and microbial physiology and community dynamics. The KBase platform has (1) extensible analytical capabilities that currently include genome assembly, annotation, ontology assignment, comparative genomics, transcriptomics, and metabolic modeling; (2) a web-browser-based user interface that supports building, sharing, and publishing reproducible and well-annotated analyses with integrated data; (3) access to extensive computational resources; and (4) a software development kit allowing the community to add functionality to the system.

Explore More