Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paul A. Thiessen is active.

Publication


Featured researches published by Paul A. Thiessen.


Nucleic Acids Research | 2004

CDD: a Conserved Domain Database for protein classification

John B. Anderson; Praveen F. Cherukuri; Carol DeWeese-Scott; Lewis Y. Geer; Marc Gwadz; Siqian He; David I. Hurwitz; John D. Jackson; Zhaoxi Ke; Christopher J. Lanczycki; Cynthia A. Liebert; Chunlei Liu; Fu Lu; Gabriele H. Marchler; Mikhail Mullokandov; Benjamin A. Shoemaker; Vahan Simonyan; James S. Song; Paul A. Thiessen; Roxanne A. Yamashita; Jodie J. Yin; Dachuan Zhang; Stephen H. Bryant

The Conserved Domain Database (CDD) is the protein classification component of NCBIs Entrez query and retrieval system. CDD is linked to other Entrez databases such as Proteins, Taxonomy and PubMed®, and can be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. CD-Search, which is available at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi, is a fast, interactive tool to identify conserved domains in new protein sequences. CD-Search results for protein sequences in Entrez are pre-computed to provide links between proteins and domain models, and computational annotation visible upon request. Protein–protein queries submitted to NCBIs BLAST search service at http://www.ncbi.nlm.nih.gov/BLAST are scanned for the presence of conserved domains by default. While CDD started out as essentially a mirror of publicly available domain alignment collections, such as SMART, Pfam and COG, we have continued an effort to update, and in some cases replace these models with domain hierarchies curated at the NCBI. Here, we report on the progress of the curation effort and associated improvements in the functionality of the CDD information retrieval system.


Annual Reports in Computational Chemistry | 2008

PubChem: Integrated Platform of Small Molecules and Biological Activities

Evan Bolton; Yanli Wang; Paul A. Thiessen; Stephen H. Bryant

Abstract PubChem is an open repository for experimental data identifying the biological activities of small molecules. PubChem contents include more than: 1000 bioassays, 28 million bioassay test outcomes, 40 million substance contributed descriptions, and 19 million unique compound structures contributed from over 70 depositing organizations. PubChem provides a significant, publicly accessible platform for mining the biological information of small molecules.


Nucleic Acids Research | 2016

PubChem Substance and Compound databases

Sunghwan Kim; Paul A. Thiessen; Evan Bolton; Jie Chen; Gang Fu; Asta Gindulyte; Lianyi Han; Jane He; Siqian He; Benjamin A. Shoemaker; Jiyao Wang; Bo Yu; Jian-Jian Zhang; Stephen H. Bryant

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, launched in 2004 as a component of the Molecular Libraries Roadmap Initiatives of the US National Institutes of Health (NIH). For the past 11 years, PubChem has grown to a sizable system, serving as a chemical information resource for the scientific research community. PubChem consists of three inter-linked databases, Substance, Compound and BioAssay. The Substance database contains chemical information deposited by individual data contributors to PubChem, and the Compound database stores unique chemical structures extracted from the Substance database. Biological activity data of chemical substances tested in assay experiments are contained in the BioAssay database. This paper provides an overview of the PubChem Substance and Compound databases, including data sources and contents, data organization, data submission using PubChem Upload, chemical structure standardization, web-based interfaces for textual and non-textual searches, and programmatic access. It also gives a brief description of PubChem3D, a resource derived from theoretical three-dimensional structures of compounds in PubChem, as well as PubChemRDF, Resource Description Framework (RDF)-formatted PubChem data for data sharing, analysis and integration with information contained in other databases.


Nucleic Acids Research | 1999

MMDB: Entrez's 3D-structure database

Yanli Wang; John B. Anderson; Jie Chen; Lewis Y. Geer; Siqian He; David I. Hurwitz; Cynthia A. Liebert; Thomas Madej; Gabriele H. Marchler; Anna R. Panchenko; Benjamin A. Shoemaker; James S. Song; Paul A. Thiessen; Roxanne A. Yamashita; Stephen H. Bryant

Three-dimensional structures are now known within many protein families and it is quite likely, in searching a sequence database, that one will encounter a homolog with known structure. The goal of Entrezs 3D-structure database is to make this information, and the functional annotation it can provide, easily accessible to molecular biologists. To this end Entrezs search engine provides three powerful features. (i) Sequence and structure neighbors; one may select all sequences similar to one of interest, for example, and link to any known 3D structures. (ii) Links between databases; one may search by term matching in MEDLINE, for example, and link to 3D structures reported in these articles. (iii) Sequence and structure visualization; identifying a homolog with known structure, one may view molecular-graphic and alignment displays, to infer approximate 3D structure. In this article we focus on two features of Entrezs Molecular Modeling Database (MMDB) not described previously: links from individual biopolymer chains within 3D structures to a systematic taxonomy of organisms represented in molecular databases, and links from individual chains (and compact 3D domains within them) to structure neighbors, other chains (and 3D domains) with similar 3D structure. MMDB may be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure.


Nucleic Acids Research | 2007

MMDB: annotating protein sequences with Entrez's 3D-structure database

Yanli Wang; Kenneth J. Addess; Jie Chen; Lewis Y. Geer; Jane He; Siqian He; Shennan Lu; Thomas Madej; Paul A. Thiessen; Naigong Zhang; Stephen H. Bryant

Three-dimensional (3D) structure is now known for a large fraction of all protein families. Thus, it has become rather likely that one will find a homolog with known 3D structure when searching a sequence database with an arbitrary query sequence. Depending on the extent of similarity, such neighbor relationships may allow one to infer biological function and to identify functional sites such as binding motifs or catalytic centers. Entrezs 3D-structure database, the Molecular Modeling Database (MMDB), provides easy access to the richness of 3D structure data and its large potential for functional annotation. Entrezs search engine offers several tools to assist biologist users: (i) links between databases, such as between protein sequences and structures, (ii) pre-computed sequence and structure neighbors, (iii) visualization of structure and sequence/structure alignment. Here, we describe an annotation service that combines some of these tools automatically, Entrezs ‘Related Structure’ links. For all proteins in Entrez, similar sequences with known 3D structure are detected by BLAST and alignments are recorded. The ‘Related Structure’ service summarizes this information and presents 3D views mapping sequence residues onto all 3D structures available in MMDB ().


Nucleic Acids Research | 2014

MMDB and VAST+: tracking structural similarities between macromolecular complexes.

Thomas Madej; Christopher J. Lanczycki; Dachuan Zhang; Paul A. Thiessen; Renata C. Geer; Stephen H. Bryant

The computational detection of similarities between protein 3D structures has become an indispensable tool for the detection of homologous relationships, the classification of protein families and functional inference. Consequently, numerous algorithms have been developed that facilitate structure comparison, including rapid searches against a steadily growing collection of protein structures. To this end, NCBI’s Molecular Modeling Database (MMDB), which is based on the Protein Data Bank (PDB), maintains a comprehensive and up-to-date archive of protein structure similarities computed with the Vector Alignment Search Tool (VAST). These similarities have been recorded on the level of single proteins and protein domains, comprising in excess of 1.5 billion pairwise alignments. Here we present VAST+, an extension to the existing VAST service, which summarizes and presents structural similarity on the level of biological assemblies or macromolecular complexes. VAST+ simplifies structure neighboring results and shows, for macromolecular complexes tracked in MMDB, lists of similar complexes ranked by the extent of similarity. VAST+ replaces the previous VAST service as the default presentation of structure neighboring data in NCBI’s Entrez query and retrieval system. MMDB and VAST+ can be accessed via http://www.ncbi.nlm.nih.gov/Structure.


Nucleic Acids Research | 2012

MMDB: 3D structures and macromolecular interactions

Thomas Madej; Kenneth J. Addess; Jessica H. Fong; Lewis Y. Geer; Renata C. Geer; Christopher J. Lanczycki; Chunlei Liu; Shennan Lu; Anna R. Panchenko; Jie Chen; Paul A. Thiessen; Yanli Wang; Dachuan Zhang; Stephen H. Bryant

Close to 60% of protein sequences tracked in comprehensive databases can be mapped to a known three-dimensional (3D) structure by standard sequence similarity searches. Potentially, a great deal can be learned about proteins or protein families of interest from considering 3D structure, and to this day 3D structure data may remain an underutilized resource. Here we present enhancements in the Molecular Modeling Database (MMDB) and its data presentation, specifically pertaining to biologically relevant complexes and molecular interactions. MMDB is tightly integrated with NCBIs Entrez search and retrieval system, and mirrors the contents of the Protein Data Bank. It links protein 3D structure data with sequence data, sequence classification resources and PubChem, a repository of small-molecule chemical structures and their biological activities, facilitating access to 3D structure data not only for structural biologists, but also for molecular biologists and chemists. MMDB provides a complete set of detailed and pre-computed structural alignments obtained with the VAST algorithm, and provides visualization tools for 3D structure and structure/sequence alignment via the molecular graphics viewer Cn3D. MMDB can be accessed at http://www.ncbi.nlm.nih.gov/structure.


Nucleic Acids Research | 2015

PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem

Sunghwan Kim; Paul A. Thiessen; Evan Bolton; Stephen H. Bryant

PubChem (http://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, developed and maintained by the US National Institutes of Health (NIH). PubChem contains more than 180 million depositor-provided chemical substance descriptions, 60 million unique chemical structures and 225 million bioactivity assay results, covering more than 9000 unique protein target sequences. As an information resource for the chemical biology research community, it routinely receives more than 1 million requests per day from an estimated more than 1 million unique users per month. Programmatic access to this vast amount of data is provided by several different systems, including the US National Center for Biotechnology Information (NCBI)s Entrez Utilities (E-Utilities or E-Utils) and the PubChem Power User Gateway (PUG)—a common gateway interface (CGI) that exchanges data through eXtended Markup Language (XML). Further simplifying programmatic access, PubChem provides two additional general purpose web services: PUG-SOAP, which uses the simple object access protocol (SOAP) and PUG-REST, which is a Representational State Transfer (REST)-style interface. These interfaces can be harnessed in combination to access the data contained in PubChem, which is integrated with the more than thirty databases available within the NCBI Entrez system.


Bioinformatics | 2005

A structure-based method for protein sequence alignment

Maricel G. Kann; Paul A. Thiessen; Anna R. Panchenko; Alejandro A. Schäffer; Stephen F. Altschul; Stephen H. Bryant

MOTIVATION With the continuing rapid growth of protein sequence data, protein sequence comparison methods have become the most widely used tools of bioinformatics. Among these methods are those that use position-specific scoring matrices (PSSMs) to describe protein families. PSSMs can capture information about conserved patterns within families, which can be used to increase the sensitivity of searches for related sequences. Certain types of structural information, however, are not generally captured by PSSM search methods. Here we introduce a program, Structure-based ALignment TOol (SALTO), that aligns protein query sequences to PSSMs using rules for placing and scoring gaps that are consistent with the conserved regions of domain alignments from NCBIs Conserved Domain Database. RESULTS In most cases, the alignment scores obtained using the local alignment version follow an extreme value distribution. SALTOs performance in finding related sequences and producing accurate alignments is similar to or better than that of IMPALA; one advantage of SALTO is that it imposes an explicit gapping model on each protein family. AVAILABILITY A stand-alone version of the program that can generate global or local alignments is available by ftp distribution (ftp://ftp.ncbi.nih.gov/pub/SALTO/), and has been incorporated to Cn3D structure/alignment viewer. CONTACT [email protected].


BMC Bioinformatics | 2010

State of the art: refinement of multiple sequence alignments

Saikat Chakrabarti; Christopher J. Lanczycki; Anna R. Panchenko; Teresa M. Przytycka; Paul A. Thiessen; Stephen H. Bryant

Correction to Chakrabarti S, Lanczycki CJ, Panchenko AR, Przytycka TM, Thiessen PA and Bryant SH: State of the art: refinement of multiple sequence alignments. BMC Bioinformatics 2006, 7:499.

Collaboration


Dive into the Paul A. Thiessen's collaboration.

Top Co-Authors

Avatar

Stephen H. Bryant

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Anna R. Panchenko

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Evan Bolton

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Benjamin A. Shoemaker

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Lewis Y. Geer

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Siqian He

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Sunghwan Kim

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Yanli Wang

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Jie Chen

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge