William Klimke | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where William Klimke is active.

Explore More

Publication

Featured researches published by William Klimke.

Nucleic Acids Research | 2009

NCBI Reference Sequences: current status, policy and new initiatives.

Kim D. Pruitt; Tatiana Tatusova; William Klimke; Donna Maglott

NCBIs Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 × 106 proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.

Omics A Journal of Integrative Biology | 2008

Toward an Online Repository of Standard Operating Procedures (SOPs) for (Meta)genomic Annotation

Samuel V. Angiuoli; Aaron Gussman; William Klimke; Guy Cochrane; Dawn Field; George M Garrity; Chinnappa D. Kodira; Nikos C. Kyrpides; Ramana Madupu; Victor Markowitz; Tatiana Tatusova; Nicholas R. Thomson; Owen White

The methodologies used to generate genome and metagenome annotations are diverse and vary between groups and laboratories. Descriptions of the annotation process are helpful in interpreting genome annotation data. Some groups have produced Standard Operating Procedures (SOPs) that describe the annotation process, but standards are lacking for structure and content of these descriptions. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse an online repository of SOPs.

Journal of Bacteriology | 2007

Comparative Genomic Analyses of Seventeen Streptococcus pneumoniae Strains: Insights into the Pneumococcal Supragenome

N. Luisa Hiller; Benjamin Janto; Justin S. Hogg; Robert Boissy; Susan Yu; Evan Powell; Randy Keefe; Nathan Ehrlich; Kai Shen; Jay Hayes; Karen A. Barbadora; William Klimke; Dmitry Dernovoy; Tatiana Tatusova; Julian Parkhill; Stephen D. Bentley; J. Christopher Post; Garth D. Ehrlich; Fen Z. Hu

The distributed-genome hypothesis (DGH) states that pathogenic bacteria possess a supragenome that is much larger than the genome of any single bacterium and that these pathogens utilize genetic recombination and a large, noncore set of genes as a means of diversity generation. We sequenced the genomes of eight nasopharyngeal strains of Streptococcus pneumoniae isolated from pediatric patients with upper respiratory symptoms and performed quantitative genomic analyses among these and nine publicly available pneumococcal strains. Coding sequences from all strains were grouped into 3,170 orthologous gene clusters, of which 1,454 (46%) were conserved among all 17 strains. The majority of the gene clusters, 1,716 (54%), were not found in all strains. Genic differences per strain pair ranged from 35 to 629 orthologous clusters, with each strains genome containing between 21 and 32% noncore genes. The distribution of the orthologous clusters per genome for the 17 strains was entered into the finite-supragenome model, which predicted that (i) the S. pneumoniae supragenome contains more than 5,000 orthologous clusters and (ii) 99% of the orthologous clusters ( approximately 3,000) that are represented in the S. pneumoniae population at frequencies of >or=0.1 can be identified if 33 representative genomes are sequenced. These extensive genic diversity data support the DGH and provide a basis for understanding the great differences in clinical phenotype associated with various pneumococcal strains. When these findings are taken together with previous studies that demonstrated the presence of a supragenome for Streptococcus agalactiae and Haemophilus influenzae, it appears that the possession of a distributed genome is a common host interaction strategy.

Nucleic Acids Research | 2009

The National Center for Biotechnology Information's Protein Clusters Database

William Klimke; Richa Agarwala; Azat Badretdin; Slava Chetvernin; Stacy Ciufo; Boris Fedorov; Boris Kiryutin; Kathleen O’Neill; Wolfgang Resch; Sergei Resenchuk; Susan C. Schafer; Igor Tolstoy; Tatiana Tatusova

Rapid increases in DNA sequencing capabilities have led to a vast increase in the data generated from prokaryotic genomic studies, which has been a boon to scientists studying micro-organism evolution and to those who wish to understand the biological underpinnings of microbial systems. The NCBI Protein Clusters Database (ProtClustDB) has been created to efficiently maintain and keep the deluge of data up to date. ProtClustDB contains both curated and uncurated clusters of proteins grouped by sequence similarity. The May 2008 release contains a total of 285 386 clusters derived from over 1.7 million proteins encoded by 3806 nt sequences from the RefSeq collection of complete chromosomes and plasmids from four major groups: prokaryotes, bacteriophages and the mitochondrial and chloroplast organelles. There are 7180 clusters containing 376 513 proteins with curated gene and protein functional annotation. PubMed identifiers and external cross references are collected for all clusters and provide additional information resources. A suite of web tools is available to explore more detailed information, such as multiple alignments, phylogenetic trees and genomic neighborhoods. ProtClustDB provides an efficient method to aggregate gene and protein annotation for researchers and is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters.

PLOS Biology | 2013

The COMBREX Project: Design, Methodology, and Initial Results

Brian P. Anton; Yi-Chien Chang; Peter Brown; Han-Pil Choi; Lina L. Faller; Jyotsna Guleria; Zhenjun Hu; Niels Klitgord; Ami Levy-Moonshine; Almaz Maksad; Varun Mazumdar; Mark McGettrick; Lais Osmani; Revonda Pokrzywa; John Rachlin; Rajeswari Swaminathan; Benjamin Allen; Genevieve Housman; Caitlin Monahan; Krista Rochussen; Kevin Tao; Ashok S. Bhagwat; Steven E. Brenner; Linda Columbus; Valérie de Crécy-Lagard; Donald J. Ferguson; Alexey Fomenkov; Giovanni Gadda; Richard D. Morgan; Andrei L. Osterman

Experimental data exists for only a vanishingly small fraction of sequenced microbial genes. This community page discusses the progress made by the COMBREX project to address this important issue using both computational and experimental resources.

Clinical Infectious Diseases | 2016

Implementation of Nationwide Real-time Whole-genome Sequencing to Enhance Listeriosis Outbreak Detection and Investigation

Brendan R. Jackson; Cheryl L. Tarr; Errol Strain; Kelly A. Jackson; Amanda Conrad; Heather Carleton; Lee S. Katz; Steven Stroika; L. Hannah Gould; Rajal K. Mody; Benjamin J. Silk; Jennifer Beal; Yi Chen; Ruth Timme; Matthew Doyle; Angela Fields; Matthew E. Wise; Glenn Tillman; Stephanie Defibaugh-Chavez; Zuzana Kucerova; Ashley Sabol; Katie Roache; Eija Trees; Mustafa Simmons; Jamie Wasilenko; Kristy Kubota; Hannes Pouseele; William Klimke; John M. Besser; Eric W. Brown

Listeria monocytogenes (Lm) causes severe foodborne illness (listeriosis). Previous molecular subtyping methods, such as pulsed-field gel electrophoresis (PFGE), were critical in detecting outbreaks that led to food safety improvements and declining incidence, but PFGE provides limited genetic resolution. A multiagency collaboration began performing real-time, whole-genome sequencing (WGS) on all US Lm isolates from patients, food, and the environment in September 2013, posting sequencing data into a public repository. Compared with the year before the project began, WGS, combined with epidemiologic and product trace-back data, detected more listeriosis clusters and solved more outbreaks (2 outbreaks in pre-WGS year, 5 in WGS year 1, and 9 in year 2). Whole-genome multilocus sequence typing and single nucleotide polymorphism analyses provided equivalent phylogenetic relationships relevant to investigations; results were most useful when interpreted in context of epidemiological data. WGS has transformed listeriosis outbreak surveillance and is being implemented for other foodborne pathogens.

Standards in Genomic Sciences | 2011

Solving the Problem: Genome Annotation Standards before the Data Deluge

William Klimke; Claire O’Donovan; Owen White; J. Rodney Brister; Karen Clark; Boris Fedorov; Ilene Mizrachi; Kim D. Pruitt; Tatiana Tatusova

The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid researchers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databases, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fundamental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone. The use of these standards in existing genomes and future submissions will increase the quality of databases, enabling researchers to make accurate biological discoveries.

Standards in Genomic Sciences | 2016

Meeting report: GenBank microbial genomic taxonomy workshop (12–13 May, 2015)

Scott Federhen; Ramon Rosselló-Móra; Hans-Peter Klenk; Brian J. Tindall; Konstantinos T. Konstantinidis; William B. Whitman; Daniel R. Brown; David P. Labeda; David W. Ussery; George M Garrity; Rita R. Colwell; Nur A. Hasan; Joerg Graf; Aidan Parte; Pablo Yarza; Brittany Goldberg; Heike Sichtig; Ilene Karsch-Mizrachi; Karen Clark; Richard McVeigh; Kim D. Pruitt; Tatiana Tatusova; Robert Falk; Sean Turner; Thomas L. Madden; Paul Kitts; Avi Kimchi; William Klimke; Richa Agarwala; Michael DiCuccio

Many genomes are incorrectly identified at GenBank. We developed a plan to find and correct misidentified genomes using genomic comparison statistics together with a scaffold of reliably identified genomes from type. A workshop was organized with broad representation from the bacterial taxonomic community to review the proposal, the GenBank Microbial Genomic Taxonomy Workshop, Bethesda MD, May 12–13, 2015.

Viruses | 2010

Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop

James Rodney Brister; Yiming Bao; Carla Kuiken; Elliot J. Lefkowitz; Philippe Le Mercier; Raphael Leplae; Ramana Madupu; Richard H. Scheuermann; Seth Schobel; Donald Seto; Susmita Shrivastava; Peter Sterk; Qiandong Zeng; William Klimke; Tatiana Tatusova

Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world’s biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.

Database | 2012

Recent advances in biocuration: Meeting report from the Fifth International Biocuration Conference

Pascale Gaudet; Cecilia N. Arighi; Frederic B. Bastian; Alex Bateman; Judith A. Blake; Michael J. Cherry; Peter D’Eustachio; Robert D. Finn; Michelle G. Giglio; Lynette Hirschman; Renate Kania; William Klimke; María Martín; Ilene Karsch-Mizrachi; Monica Munoz-Torres; Darren A. Natale; Claire O’Donovan; Francis Ouellette; Kim D. Pruitt; Marc Robinson-Rechavi; Susanna-Assunta Sansone; Paul N. Schofield; Granger Sutton; Kimberly Van Auken; Sona Vasudevan; Cathy H. Wu; Jasmine Young; Raja Mazumder

The 5th International Biocuration Conference brought together over 300 scientists to exchange on their work, as well as discuss issues relevant to the International Society for Biocuration’s (ISB) mission. Recurring themes this year included the creation and promotion of gold standards, the need for more ontologies, and more formal interactions with journals. The conference is an essential part of the ISBs goal to support exchanges among members of the biocuration community. Next years conference will be held in Cambridge, UK, from 7 to 10 April 2013. In the meanwhile, the ISB website provides information about the societys activities (http://biocurator.org), as well as related events of interest.

Explore More