William Klimke
National Institutes of Health
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by William Klimke.
Nucleic Acids Research | 2009
Kim D. Pruitt; Tatiana Tatusova; William Klimke; Donna Maglott
NCBIs Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 × 106 proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.
Omics A Journal of Integrative Biology | 2008
Samuel V. Angiuoli; Aaron Gussman; William Klimke; Guy Cochrane; Dawn Field; George M Garrity; Chinnappa D. Kodira; Nikos C. Kyrpides; Ramana Madupu; Victor Markowitz; Tatiana Tatusova; Nicholas R. Thomson; Owen White
The methodologies used to generate genome and metagenome annotations are diverse and vary between groups and laboratories. Descriptions of the annotation process are helpful in interpreting genome annotation data. Some groups have produced Standard Operating Procedures (SOPs) that describe the annotation process, but standards are lacking for structure and content of these descriptions. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse an online repository of SOPs.
Journal of Bacteriology | 2007
N. Luisa Hiller; Benjamin Janto; Justin S. Hogg; Robert Boissy; Susan Yu; Evan Powell; Randy Keefe; Nathan Ehrlich; Kai Shen; Jay Hayes; Karen A. Barbadora; William Klimke; Dmitry Dernovoy; Tatiana Tatusova; Julian Parkhill; Stephen D. Bentley; J. Christopher Post; Garth D. Ehrlich; Fen Z. Hu
The distributed-genome hypothesis (DGH) states that pathogenic bacteria possess a supragenome that is much larger than the genome of any single bacterium and that these pathogens utilize genetic recombination and a large, noncore set of genes as a means of diversity generation. We sequenced the genomes of eight nasopharyngeal strains of Streptococcus pneumoniae isolated from pediatric patients with upper respiratory symptoms and performed quantitative genomic analyses among these and nine publicly available pneumococcal strains. Coding sequences from all strains were grouped into 3,170 orthologous gene clusters, of which 1,454 (46%) were conserved among all 17 strains. The majority of the gene clusters, 1,716 (54%), were not found in all strains. Genic differences per strain pair ranged from 35 to 629 orthologous clusters, with each strains genome containing between 21 and 32% noncore genes. The distribution of the orthologous clusters per genome for the 17 strains was entered into the finite-supragenome model, which predicted that (i) the S. pneumoniae supragenome contains more than 5,000 orthologous clusters and (ii) 99% of the orthologous clusters ( approximately 3,000) that are represented in the S. pneumoniae population at frequencies of >or=0.1 can be identified if 33 representative genomes are sequenced. These extensive genic diversity data support the DGH and provide a basis for understanding the great differences in clinical phenotype associated with various pneumococcal strains. When these findings are taken together with previous studies that demonstrated the presence of a supragenome for Streptococcus agalactiae and Haemophilus influenzae, it appears that the possession of a distributed genome is a common host interaction strategy.
Nucleic Acids Research | 2009
William Klimke; Richa Agarwala; Azat Badretdin; Slava Chetvernin; Stacy Ciufo; Boris Fedorov; Boris Kiryutin; Kathleen O’Neill; Wolfgang Resch; Sergei Resenchuk; Susan C. Schafer; Igor Tolstoy; Tatiana Tatusova
Rapid increases in DNA sequencing capabilities have led to a vast increase in the data generated from prokaryotic genomic studies, which has been a boon to scientists studying micro-organism evolution and to those who wish to understand the biological underpinnings of microbial systems. The NCBI Protein Clusters Database (ProtClustDB) has been created to efficiently maintain and keep the deluge of data up to date. ProtClustDB contains both curated and uncurated clusters of proteins grouped by sequence similarity. The May 2008 release contains a total of 285 386 clusters derived from over 1.7 million proteins encoded by 3806 nt sequences from the RefSeq collection of complete chromosomes and plasmids from four major groups: prokaryotes, bacteriophages and the mitochondrial and chloroplast organelles. There are 7180 clusters containing 376 513 proteins with curated gene and protein functional annotation. PubMed identifiers and external cross references are collected for all clusters and provide additional information resources. A suite of web tools is available to explore more detailed information, such as multiple alignments, phylogenetic trees and genomic neighborhoods. ProtClustDB provides an efficient method to aggregate gene and protein annotation for researchers and is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters.
PLOS Biology | 2013
Brian P. Anton; Yi-Chien Chang; Peter Brown; Han-Pil Choi; Lina L. Faller; Jyotsna Guleria; Zhenjun Hu; Niels Klitgord; Ami Levy-Moonshine; Almaz Maksad; Varun Mazumdar; Mark McGettrick; Lais Osmani; Revonda Pokrzywa; John Rachlin; Rajeswari Swaminathan; Benjamin Allen; Genevieve Housman; Caitlin Monahan; Krista Rochussen; Kevin Tao; Ashok S. Bhagwat; Steven E. Brenner; Linda Columbus; Valérie de Crécy-Lagard; Donald J. Ferguson; Alexey Fomenkov; Giovanni Gadda; Richard D. Morgan; Andrei L. Osterman
Experimental data exists for only a vanishingly small fraction of sequenced microbial genes. This community page discusses the progress made by the COMBREX project to address this important issue using both computational and experimental resources.
Clinical Infectious Diseases | 2016
Brendan R. Jackson; Cheryl L. Tarr; Errol Strain; Kelly A. Jackson; Amanda Conrad; Heather Carleton; Lee S. Katz; Steven Stroika; L. Hannah Gould; Rajal K. Mody; Benjamin J. Silk; Jennifer Beal; Yi Chen; Ruth Timme; Matthew Doyle; Angela Fields; Matthew E. Wise; Glenn Tillman; Stephanie Defibaugh-Chavez; Zuzana Kucerova; Ashley Sabol; Katie Roache; Eija Trees; Mustafa Simmons; Jamie Wasilenko; Kristy Kubota; Hannes Pouseele; William Klimke; John M. Besser; Eric W. Brown
Listeria monocytogenes (Lm) causes severe foodborne illness (listeriosis). Previous molecular subtyping methods, such as pulsed-field gel electrophoresis (PFGE), were critical in detecting outbreaks that led to food safety improvements and declining incidence, but PFGE provides limited genetic resolution. A multiagency collaboration began performing real-time, whole-genome sequencing (WGS) on all US Lm isolates from patients, food, and the environment in September 2013, posting sequencing data into a public repository. Compared with the year before the project began, WGS, combined with epidemiologic and product trace-back data, detected more listeriosis clusters and solved more outbreaks (2 outbreaks in pre-WGS year, 5 in WGS year 1, and 9 in year 2). Whole-genome multilocus sequence typing and single nucleotide polymorphism analyses provided equivalent phylogenetic relationships relevant to investigations; results were most useful when interpreted in context of epidemiological data. WGS has transformed listeriosis outbreak surveillance and is being implemented for other foodborne pathogens.
Standards in Genomic Sciences | 2011
William Klimke; Claire O’Donovan; Owen White; J. Rodney Brister; Karen Clark; Boris Fedorov; Ilene Mizrachi; Kim D. Pruitt; Tatiana Tatusova
The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid researchers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databases, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fundamental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone. The use of these standards in existing genomes and future submissions will increase the quality of databases, enabling researchers to make accurate biological discoveries.
Standards in Genomic Sciences | 2016
Scott Federhen; Ramon Rosselló-Móra; Hans-Peter Klenk; Brian J. Tindall; Konstantinos T. Konstantinidis; William B. Whitman; Daniel R. Brown; David P. Labeda; David W. Ussery; George M Garrity; Rita R. Colwell; Nur A. Hasan; Joerg Graf; Aidan Parte; Pablo Yarza; Brittany Goldberg; Heike Sichtig; Ilene Karsch-Mizrachi; Karen Clark; Richard McVeigh; Kim D. Pruitt; Tatiana Tatusova; Robert Falk; Sean Turner; Thomas L. Madden; Paul Kitts; Avi Kimchi; William Klimke; Richa Agarwala; Michael DiCuccio
Many genomes are incorrectly identified at GenBank. We developed a plan to find and correct misidentified genomes using genomic comparison statistics together with a scaffold of reliably identified genomes from type. A workshop was organized with broad representation from the bacterial taxonomic community to review the proposal, the GenBank Microbial Genomic Taxonomy Workshop, Bethesda MD, May 12–13, 2015.
Viruses | 2010
James Rodney Brister; Yiming Bao; Carla Kuiken; Elliot J. Lefkowitz; Philippe Le Mercier; Raphael Leplae; Ramana Madupu; Richard H. Scheuermann; Seth Schobel; Donald Seto; Susmita Shrivastava; Peter Sterk; Qiandong Zeng; William Klimke; Tatiana Tatusova
Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world’s biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.
Database | 2012
Pascale Gaudet; Cecilia N. Arighi; Frederic B. Bastian; Alex Bateman; Judith A. Blake; Michael J. Cherry; Peter D’Eustachio; Robert D. Finn; Michelle G. Giglio; Lynette Hirschman; Renate Kania; William Klimke; María Martín; Ilene Karsch-Mizrachi; Monica Munoz-Torres; Darren A. Natale; Claire O’Donovan; Francis Ouellette; Kim D. Pruitt; Marc Robinson-Rechavi; Susanna-Assunta Sansone; Paul N. Schofield; Granger Sutton; Kimberly Van Auken; Sona Vasudevan; Cathy H. Wu; Jasmine Young; Raja Mazumder
The 5th International Biocuration Conference brought together over 300 scientists to exchange on their work, as well as discuss issues relevant to the International Society for Biocuration’s (ISB) mission. Recurring themes this year included the creation and promotion of gold standards, the need for more ontologies, and more formal interactions with journals. The conference is an essential part of the ISBs goal to support exchanges among members of the biocuration community. Next years conference will be held in Cambridge, UK, from 7 to 10 April 2013. In the meanwhile, the ISB website provides information about the societys activities (http://biocurator.org), as well as related events of interest.