Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where William S. Hayes is active.

Publication


Featured researches published by William S. Hayes.


Nature | 1997

The complete genome sequence of the gastric pathogen Helicobacter pylori

Jean-F. Tomb; Owen White; Anthony R. Kerlavage; Rebecca A. Clayton; Granger Sutton; Robert D. Fleischmann; Karen A. Ketchum; Hans-Peter Klenk; Steven R. Gill; Brian A. Dougherty; Karen E. Nelson; John Quackenbush; Lixin Zhou; Ewen F. Kirkness; Scott N. Peterson; Brendan J. Loftus; Delwood Richardson; Robert J. Dodson; Hanif G. Khalak; Anna Glodek; Keith McKenney; Lisa M. Fitzegerald; Norman H. Lee; Mark D. Adams; Erin Hickey; Douglas E. Berg; Jeanine D. Gocayne; Teresa Utterback; Jeremy Peterson; Jenny M. Kelley

Helicobacter pylori, strain 26695, has a circular genome of 1,667,867 base pairs and 1,590 predicted coding sequences. Sequence analysis indicates that H. pylori has well-developed systems for motility, for scavenging iron, and for DNA restriction and modification. Many putative adhesins, lipoproteins and other outer membrane proteins were identified, underscoring the potential complexity of host–pathogen interaction. Based on the large number of sequence-related genes encoding outer membrane proteins and the presence of homopolymeric tracts and dinucleotide repeats in coding sequences, H. pylori, like several other mucosal pathogens, probably uses recombination and slipped-strand mispairing within repeats as mechanisms for antigenic variation and adaptive evolution. Consistent with its restricted niche, H. pylori has a few regulatory networks, and a limited metabolic repertoire and biosynthetic capacity. Its survival in acid conditions depends, in part, on its ability to establish a positive inside-membrane potential in low pH.


Current Biology | 1996

Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli

Roman L. Tatusov; Arcady Mushegian; Peer Bork; Nigel P. Brown; William S. Hayes; Mark Borodovsky; Kenneth E. Rudd; Eugene V. Koonin

BACKGROUND The 1.83 Megabase (Mb) sequence of the Haemophilus influenzae chromosome, the first completed genome sequence of a cellular life form, has been recently reported. Approximately 75 % of the 4.7 Mb genome sequence of Escherichia coli is also available. The life styles of the two bacteria are very different - H. influenzae is an obligate parasite that lives in human upper respiratory mucosa and can be cultivated only on rich media, whereas E. coli is a saprophyte that can grow on minimal media. A detailed comparison of the protein products encoded by these two genomes is expected to provide valuable insights into bacterial cell physiology and genome evolution. RESULTS We describe the results of computer analysis of the amino-acid sequences of 1703 putative proteins encoded by the complete genome of H. influenzae. We detected sequence similarity to proteins in current databases for 92 % of the H. influenzae protein sequences, and at least a general functional prediction was possible for 83 %. A comparison of the H. influenzae protein sequences with those of 3010 proteins encoded by the sequenced 75 % of the E. coli genome revealed 1128 pairs of apparent orthologs, with an average of 59 % identity. In contrast to the high similarity between orthologs, the genome organization and the functional repertoire of genes in the two bacteria were remarkably different. The smaller genome size of H. influenzae is explained, to a large extent, by a reduction in the number of paralogous genes. There was no long range colinearity between the E. coli and H. influenzae gene orders, but over 70 % of the orthologous genes were found in short conserved strings, only about half of which were operons in E. coli. Superposition of the H. influenzae enzyme repertoire upon the known E. coli metabolic pathways allowed us to reconstruct similar and alternative pathways in H. influenzae and provides an explanation for the known nutritional requirements. CONCLUSIONS By comparing proteins encoded by the two bacterial genomes, we have shown that extensive gene shuffling and variation in the extent of gene paralogy are major trends in bacterial evolution; this comparison has also allowed us to deduce crucial aspects of the largely uncharacterized metabolism of H. influenzae.


pacific symposium on biocomputing | 2007

Information needs and the role of text mining in drug development.

Phoebe M. Roberts; William S. Hayes

Drug development generates information needs from groups throughout a company. Knowing where to look for high-quality information is essential for minimizing costs and remaining competitive. Using 1131 research requests that came to our library between 2001 and 2007, we show that drugs, diseases, and genes/proteins are the most frequently searched subjects, and journal articles, patents, and competitive intelligence literature are the most frequently consulted textual resources.


Dna Sequence | 1997

Gene Identification and Classification in the Synechocystis Genomic Sequence by Recursive Gene Mark Analysis

Makoto Hirosawa; Katsumi Isono; William S. Hayes; Mark Borodovsky

The GeneMark method has proven to be an efficient gene-finding tool for the analysis of prokaryotic genomic sequence data. We have developed a procedure of deriving and utilizing several GeneMark models in order to get better gene-detection performance. Upon applying this procedure to the 1.0 Mb contiguous DNA sequence of Synechocystis sp. strain PCC6803, we were able to cluster predicted genes into distinct classes and to produce the class-specific GeneMark models reflecting statistical characteristics of each gene class. One gene class apparently includes genes of exogenous origin. Using class-specific models reduces the gene under prediction error rate down to 1.7% in comparison with 8.1% reported in the previous study when only one GeneMark model was used.


Database | 2016

Training and evaluation corpora for the extraction of causal relationships encoded in biological expression language (BEL)

Juliane Fluck; Sumit Madan; Sam Ansari; Alpha Tom Kodamullil; Reagon Karki; Majid Rastegar-Mojarad; Natalie L. Catlett; William S. Hayes; Justyna Szostak; Julia Hoeng; Manuel C. Peitsch

Success in extracting biological relationships is mainly dependent on the complexity of the task as well as the availability of high-quality training data. Here, we describe the new corpora in the systems biology modeling language BEL for training and testing biological relationship extraction systems that we prepared for the BioCreative V BEL track. BEL was designed to capture relationships not only between proteins or chemicals, but also complex events such as biological processes or disease states. A BEL nanopub is the smallest unit of information and represents a biological relationship with its provenance. In BEL relationships (called BEL statements), the entities are normalized to defined namespaces mainly derived from public repositories, such as sequence databases, MeSH or publicly available ontologies. In the BEL nanopubs, the BEL statements are associated with citation information and supportive evidence such as a text excerpt. To enable the training of extraction tools, we prepared BEL resources and made them available to the community. We selected a subset of these resources focusing on a reduced set of namespaces, namely, human and mouse genes, ChEBI chemicals, MeSH diseases and GO biological processes, as well as relationship types ‘increases’ and ‘decreases’. The published training corpus contains 11 000 BEL statements from over 6000 supportive text excerpts. For method evaluation, we selected and re-annotated two smaller subcorpora containing 100 text excerpts. For this re-annotation, the inter-annotator agreement was measured by the BEL track evaluation environment and resulted in a maximal F-score of 91.18% for full statement agreement. In addition, for a set of 100 BEL statements, we do not only provide the gold standard expert annotations, but also text excerpts pre-selected by two automated systems. Those text excerpts were evaluated and manually annotated as true or false supportive in the course of the BioCreative V BEL track task. Database URL: http://wiki.openbel.org/display/BIOC/Datasets


pacific symposium on biocomputing | 2014

Reputation-based collaborative network biology.

Jean Binder; Stéphanie Boué; Anselmo Di Fabio; R. Brett Fields; William S. Hayes; Julia Hoeng; Jennifer Park; Manuel C. Peitsch

A pilot reputation-based collaborative network biology platform, Bionet, was developed for use in the sbv IMPROVER Network Verification Challenge to verify and enhance previously developed networks describing key aspects of lung biology. Bionet was successful in capturing a more comprehensive view of the biology associated with each network using the collective intelligence and knowledge of the crowd. One key learning point from the pilot was that using a standardized biological knowledge representation language such as BEL is critical to the success of a collaborative network biology platform. Overall, Bionet demonstrated that this approach to collaborative network biology is highly viable. Improving this platform for de novo creation of biological networks and network curation with the suggested enhancements for scalability will serve both academic and industry systems biology communities.


Expert Opinion on Drug Discovery | 2017

Novel approaches to develop community-built biological network models for potential drug discovery

Marja Talikka; Natalia Bukharov; William S. Hayes; Martin Hofmann-Apitius; Leonidas G. Alexopoulos; Manuel C. Peitsch; Julia Hoeng

ABSTRACT Introduction: Hundreds of thousands of data points are now routinely generated in clinical trials by molecular profiling and NGS technologies. A true translation of this data into knowledge is not possible without analysis and interpretation in a well-defined biology context. Currently, there are many public and commercial pathway tools and network models that can facilitate such analysis. At the same time, insights and knowledge that can be gained is highly dependent on the underlying biological content of these resources. Crowdsourcing can be employed to guarantee the accuracy and transparency of the biological content underlining the tools used to interpret rich molecular data. Areas covered: In this review, the authors describe crowdsourcing in drug discovery. The focal point is the efforts that have successfully used the crowdsourcing approach to verify and augment pathway tools and biological network models. Technologies that enable the building of biological networks with the community are also described. Expert opinion: A crowd of experts can be leveraged for the entire development process of biological network models, from ontologies to the evaluation of their mechanistic completeness. The ultimate goal is to facilitate biomarker discovery and personalized medicine by mechanistically explaining patients’ differences with respect to disease prevention, diagnosis, and therapy outcome.


Genome Research | 1998

How to Interpret an Anonymous Bacterial Genome: Machine Learning Approach to Gene Identification

William S. Hayes; Mark Borodovsky


Genome Research | 2001

GeneLynx: A Gene-Centric Portal to the Human Genome

Boris Lenhard; William S. Hayes; Wyeth W. Wasserman


pacific symposium on biocomputing | 1998

Deriving ribosomal binding site (RBS) statistical models from unannotated DNA sequences and the use of the RBS model for N-terminal prediction.

William S. Hayes; Mark Borodovsky

Collaboration


Dive into the William S. Hayes's collaboration.

Top Co-Authors

Avatar

Mark Borodovsky

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

James D. McIninch

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alexander V. Lukashin

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Anna Glodek

J. Craig Venter Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Arcady Mushegian

National Science Foundation

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge