Adam Godzik
Sanford-Burnham Institute for Medical Research
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Adam Godzik.
Bioinformatics | 2006
Weizhong Li; Adam Godzik
MOTIVATION In 2001 and 2002, we published two papers (Bioinformatics, 17, 282-283, Bioinformatics, 18, 77-82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST.
Cell | 2004
Andres Alonso; Joanna Sasin; Nunzio Bottini; Ilan Friedberg; Iddo Friedberg; Andrei L. Osterman; Adam Godzik; Tony Hunter; Jack E. Dixon; Tomas Mustelin
Tyrosine phosphorylation is catalyzed by protein tyrosine kinases, which are represented by 90 genes in the human genome. Here, we present the set of 107 genes in the human genome that encode members of the four protein tyrosine phosphatase (PTP) families. The four families of PTPases, their substrates, structure, function, regulation, and the role of these enzymes in human disease will be discussed.
PLOS Biology | 2007
Shibu Yooseph; Granger Sutton; Douglas B. Rusch; Aaron L. Halpern; Shannon J. Williamson; Karin A. Remington; Jonathan A. Eisen; Karla B. Heidelberg; Gerard Manning; Weizhong Li; Lukasz Jaroszewski; Piotr Cieplak; Christopher S. Miller; Huiying Li; Susan T. Mashiyama; Marcin P Joachimiak; Christopher van Belle; John-Marc Chandonia; David A W Soergel; Yufeng Zhai; Kannan Natarajan; Shaun W. Lee; Benjamin J. Raphael; Vineet Bafna; Robert Friedman; Steven E. Brenner; Adam Godzik; David Eisenberg; Jack E. Dixon; Susan S. Taylor
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.
Science | 2009
Dong-Hyung Cho; Tomohiro Nakamura; Jianguo Fang; Piotr Cieplak; Adam Godzik; Zezong Gu; Stuart A. Lipton
Mitochondria continuously undergo two opposing processes, fission and fusion. The disruption of this dynamic equilibrium may herald cell injury or death and may contribute to developmental and neurodegenerative disorders. Nitric oxide functions as a signaling molecule, but in excess it mediates neuronal injury, in part via mitochondrial fission or fragmentation. However, the underlying mechanism for nitric oxide–induced pathological fission remains unclear. We found that nitric oxide produced in response to β-amyloid protein, thought to be a key mediator of Alzheimers disease, triggered mitochondrial fission, synaptic loss, and neuronal damage, in part via S-nitrosylation of dynamin-related protein 1 (forming SNO-Drp1). Preventing nitrosylation of Drp1 by cysteine mutation abrogated these neurotoxic events. SNO-Drp1 is increased in brains of human Alzheimers disease patients and may thus contribute to the pathogenesis of neurodegeneration.
Nucleic Acids Research | 2005
Lukasz Jaroszewski; Leszek Rychlewski; Zhanwen Li; Weizhong Li; Adam Godzik
The FFAS03 server provides a web interface to the third generation of the profile–profile alignment and fold-recognition algorithm of fold and function assignment system (FFAS) [L. Rychlewski, L. Jaroszewski, W. Li and A. Godzik (2000), Protein Sci., 9, 232–241]. Profile–profile algorithms use information present in sequences of homologous proteins to amplify the patterns defining the family. As a result, they enable detection of remote homologies beyond the reach of other methods. FFAS, initially developed in 2000, is consistently one of the best ranked fold prediction methods in the CAFASP and LiveBench competitions. It is also used by several fold-recognition consensus methods and meta-servers. The FFAS03 server accepts a user supplied protein sequence and automatically generates a profile, which is then compared with several sets of sequence profiles of proteins from PDB, COG, PFAM and SCOP. The profile databases used by the server are automatically updated with the latest structural and sequence information. The server provides access to the alignment analysis, multiple alignment, and comparative modeling tools. Access to the server is open for both academic and commercial researchers. The FFAS03 server is available at .
Immunity | 2008
Jenny P.-Y. Ting; Ruth C. Lovering; Emad S. Alnemri; John Bertin; Jeremy M. Boss; Beckley K. Davis; Richard A. Flavell; Stephen E. Girardin; Adam Godzik; Jonathan A. Harton; Hal M. Hoffman; Jean Pierre Hugot; Naohiro Inohara; Alex MacKenzie; Lois J. Maltais; Gabriel Núñez; Yasunori Ogura; Luc A. Otten; Dana J. Philpott; John C. Reed; Walter Reith; Stefan Schreiber; Viktor Steimle; Peter A. Ward
Iimmune regulatory proteins such as CIITA, NAIP, IPAF, NOD1, NOD2, NALP1, cryopyrin/NALP3 are members of a family characterized by the presence of a nucleotide-binding domain (NBD) and leucine-rich repeats (LRR). Members of this gene family encode a protein structure similar to the NB-LRR subgroup of disease-resistance genes in plants and are involved in the sensing of pathogenic products and the regulation of cell signaling and apoptosis. Several members of this family have been associated with immunologic disorders. NOD2 for instance is associated with both Crohns disease and Blau syndrome. A variety of different names are currently used to describe this gene family, its subfamilies and individual genes, including CATERPILLER (CLR), NOD-LRR, NACHT-LRR, CARD, NALP, NOD, PAN and PYPAF, and this lack of consistency has led to a pressing need to unify the nomenclature. Consequently, we collectively propose the family designation NLR (nucleotide-binding domain and leucine-rich repeat containing) and provide unique and standardized gene designations for all family members.
PLOS Computational Biology | 2010
John Wooley; Adam Godzik; Iddo Friedberg
Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics.
Genome Research | 2008
Linda Z. Holland; Ricard Albalat; Kaoru Azumi; Èlia Benito-Gutiérrez; Matthew J. Blow; Marianne Bronner-Fraser; Frédéric Brunet; Thomas Butts; Simona Candiani; Larry J. Dishaw; David E. K. Ferrier; Jordi Garcia-Fernàndez; Jeremy J. Gibson-Brown; Carmela Gissi; Adam Godzik; Finn Hallböök; Dan Hirose; Kazuyoshi Hosomichi; Tetsuro Ikuta; Hidetoshi Inoko; Masanori Kasahara; Jun Kasamatsu; Takeshi Kawashima; Ayuko Kimura; Masaaki Kobayashi; Zbynek Kozmik; Kaoru Kubokawa; Vincent Laudet; Gary W. Litman; Alice C. McHardy
Cephalochordates, urochordates, and vertebrates evolved from a common ancestor over 520 million years ago. To improve our understanding of chordate evolution and the origin of vertebrates, we intensively searched for particular genes, gene families, and conserved noncoding elements in the sequenced genome of the cephalochordate Branchiostoma floridae, commonly called amphioxus or lancelets. Special attention was given to homeobox genes, opsin genes, genes involved in neural crest development, nuclear receptor genes, genes encoding components of the endocrine and immune systems, and conserved cis-regulatory enhancers. The amphioxus genome contains a basic set of chordate genes involved in development and cell signaling, including a fifteenth Hox gene. This set includes many genes that were co-opted in vertebrates for new roles in neural crest development and adaptive immunity. However, where amphioxus has a single gene, vertebrates often have two, three, or four paralogs derived from two whole-genome duplication events. In addition, several transcriptional enhancers are conserved between amphioxus and vertebrates--a very wide phylogenetic distance. In contrast, urochordate genomes have lost many genes, including a diversity of homeobox families and genes involved in steroid hormone function. The amphioxus genome also exhibits derived features, including duplications of opsins and genes proposed to function in innate immunity and endocrine systems. Our results indicate that the amphioxus genome is elemental to an understanding of the biology and evolution of nonchordate deuterostomes, invertebrate chordates, and vertebrates.
Immunity | 2008
Jenny P.-Y. Ting; Ruth C. Lovering; Emad S. Alnemri; John Bertin; Jeremy M. Boss; Beckley K. Davis; Richard A. Flavell; Stephen E. Girardin; Adam Godzik; Jonathan A. Harton; Hal M. Hoffman; Jean-Pierre Hugot; Naohiro Inohara; Alex MacKenzie; Lois J. Maltais; Gabriel Núñez; Yasunori Ogura; Luc A. Otten; Peter A. Ward
Iimmune regulatory proteins such as CIITA, NAIP, IPAF, NOD1, NOD2, NALP1, cryopyrin/NALP3 are members of a family characterized by the presence of a nucleotide-binding domain (NBD) and leucine-rich repeats (LRR). Members of this gene family encode a protein structure similar to the NB-LRR subgroup of disease-resistance genes in plants and are involved in the sensing of pathogenic products and the regulation of cell signaling and apoptosis. Several members of this family have been associated with immunologic disorders. NOD2 for instance is associated with both Crohns disease and Blau syndrome. A variety of different names are currently used to describe this gene family, its subfamilies and individual genes, including CATERPILLER (CLR), NOD-LRR, NACHT-LRR, CARD, NALP, NOD, PAN and PYPAF, and this lack of consistency has led to a pressing need to unify the nomenclature. Consequently, we collectively propose the family designation NLR (nucleotide-binding domain and leucine-rich repeat containing) and provide unique and standardized gene designations for all family members.
Proceedings of the National Academy of Sciences of the United States of America | 2002
Scott A. Lesley; Peter Kuhn; Adam Godzik; Ashley M. Deacon; Irimpan I. Mathews; Andreas Kreusch; Glen Spraggon; Heath E. Klock; Daniel McMullan; Tanya Shin; Juli Vincent; Alyssa Robb; Linda S. Brinen; Mitchell D. Miller; Timothy M. McPhillips; Mark A. Miller; Daniel Scheibe; Jaume M. Canaves; Chittibabu Guda; Lukasz Jaroszewski; Thomas L. Selby; Marc André Elsliger; John Wooley; Susan S. Taylor; Keith O. Hodgson; Ian A. Wilson; Peter G. Schultz; Raymond C. Stevens
Structural genomics is emerging as a principal approach to define protein structure–function relationships. To apply this approach on a genomic scale, novel methods and technologies must be developed to determine large numbers of structures. We describe the design and implementation of a high-throughput structural genomics pipeline and its application to the proteome of the thermophilic bacterium Thermotoga maritima. By using this pipeline, we successfully cloned and attempted expression of 1,376 of the predicted 1,877 genes (73%) and have identified crystallization conditions for 432 proteins, comprising 23% of the T. maritima proteome. Representative structures from TM0423 glycerol dehydrogenase and TM0449 thymidylate synthase-complementing protein are presented as examples of final outputs from the pipeline.