Jose M. Duarte | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jose M. Duarte is active.

Explore More

Publication

Featured researches published by Jose M. Duarte.

BMC Bioinformatics | 2012

Protein interface classification by evolutionary analysis

Jose M. Duarte; Adam Srebniak; Martin A. Schärer; Guido Capitani

BackgroundDistinguishing biologically relevant interfaces from lattice contacts in protein crystals is a fundamental problem in structural biology. Despite efforts towards the computational prediction of interface character, many issues are still unresolved.ResultsWe present here a protein-protein interface classifier that relies on evolutionary data to detect the biological character of interfaces. The classifier uses a simple geometric measure, number of core residues, and two evolutionary indicators based on the sequence entropy of homolog sequences. Both aim at detecting differential selection pressure between interface core and rim or rest of surface. The core residues, defined as fully buried residues (>95% burial), appear to be fundamental determinants of biological interfaces: their number is in itself a powerful discriminator of interface character and together with the evolutionary measures it is able to clearly distinguish evolved biological contacts from crystal ones. We demonstrate that this definition of core residues leads to distinctively better results than earlier definitions from the literature. The stringent selection and quality filtering of structural and sequence data was key to the success of the method. Most importantly we demonstrate that a more conservative selection of homolog sequences - with relatively high sequence identities to the query - is able to produce a clearer signal than previous attempts.ConclusionsAn evolutionary approach like the one presented here is key to the advancement of the field, which so far was missing an effective method exploiting the evolutionary character of protein interfaces. Its coverage and performance will only improve over time thanks to the incessant growth of sequence databases. Currently our method reaches an accuracy of 89% in classifying interfaces of the Ponstingl 2003 datasets and it lends itself to a variety of useful applications in structural biology and bioinformatics. We made the corresponding software implementation available to the community as an easy-to-use graphical web interface at http://www.eppic-web.org.

Bioinformatics | 2011

CMView: Interactive contact map visualization and analysis

Corinna Vehlow; Henning Stehr; Matthias Winkelmann; Jose M. Duarte; Lars Petzold; Juliane Dinse; Michael Lappe

SUMMARY Contact maps are a valuable visualization tool in structural biology. They are a convenient way to display proteins in two dimensions and to quickly identify structural features such as domain architecture, secondary structure and contact clusters. We developed a tool called CMView which integrates rich contact map analysis with 3D visualization using PyMol. Our tool provides functions for contact map calculation from structure, basic editing, visualization in contact map and 3D space and structural comparison with different built-in alignment methods. A unique feature is the interactive refinement of structural alignments based on user selected substructures. AVAILABILITY CMView is freely available for Linux, Windows and MacOS. The software and a comprehensive manual can be downloaded from http://www.bioinformatics.org/cmview/. The source code is licensed under the GNU General Public License.

BMC Bioinformatics | 2010

Optimal contact definition for reconstruction of Contact Maps

Jose M. Duarte; Rajagopal Sathyapriya; Henning Stehr; Ioannis Filippis; Michael Lappe

BackgroundContact maps have been extensively used as a simplified representation of protein structures. They capture most important features of a proteins fold, being preferred by a number of researchers for the description and study of protein structures. Inspired by the models simplicity many groups have dedicated a considerable amount of effort towards contact prediction as a proxy for protein structure prediction. However a contact maps biological interest is subject to the availability of reliable methods for the 3-dimensional reconstruction of the structure.ResultsWe use an implementation of the well-known distance geometry protocol to build realistic protein 3-dimensional models from contact maps, performing an extensive exploration of many of the parameters involved in the reconstruction process. We try to address the questions: a) to what accuracy does a contact map represent its corresponding 3D structure, b) what is the best contact map representation with regard to reconstructability and c) what is the effect of partial or inaccurate contact information on the 3D structure recovery. Our results suggest that contact maps derived from the application of a distance cutoff of 9 to 11Å around the Cβatoms constitute the most accurate representation of the 3D structure. The reconstruction process does not provide a single solution to the problem but rather an ensemble of conformations that are within 2Å RMSD of the crystal structure and with lower values for the pairwise average ensemble RMSD. Interestingly it is still possible to recover a structure with partial contact information, although wrong contacts can lead to dramatic loss in reconstruction fidelity.ConclusionsThus contact maps represent a valid approximation to the structures with an accuracy comparable to that of experimental methods. The optimal contact definitions constitute key guidelines for methods based on contact maps such as structure prediction through contacts and structural alignments based on maximum contact map overlap.

Molecular Cancer | 2011

The structural impact of cancer-associated missense mutations in oncogenes and tumor suppressors

Henning Stehr; Seon-Hi J Jang; Jose M. Duarte; Christoph Wierling; Hans Lehrach; Michael Lappe; Bodo Lange

BackgroundCurrent large-scale cancer sequencing projects have identified large numbers of somatic mutations covering an increasing number of different cancer tissues and patients. However, the characterization of these mutations at the structural and functional level remains a challenge.ResultsWe present results from an analysis of the structural impact of frequent missense cancer mutations using an automated method. We find that inactivation of tumor suppressors in cancer correlates frequently with destabilizing mutations preferably in the core of the protein, while enhanced activity of oncogenes is often linked to specific mutations at functional sites. Furthermore, our results show that this alteration of oncogenic activity is often associated with mutations at ATP or GTP binding sites.ConclusionsWith our findings we can confirm and statistically validate the hypotheses for the gain-of-function and loss-of-function mechanisms of oncogenes and tumor suppressors, respectively. We show that the distinct mutational patterns can potentially be used to pre-classify newly identified cancer-associated genes with yet unknown function.

Database | 2010

PDBWiki: added value through community annotation of the Protein Data Bank

Henning Stehr; Jose M. Duarte; Michael Lappe; Jong Bhak; Dan M. Bolser

The success of community projects such as Wikipedia has recently prompted a discussion about the applicability of such tools in the life sciences. Currently, there are several such ‘science-wikis’ that aim to collect specialist knowledge from the community into centralized resources. However, there is no consensus about how to achieve this goal. For example, it is not clear how to best integrate data from established, centralized databases with that provided by ‘community annotation’. We created PDBWiki, a scientific wiki for the community annotation of protein structures. The wiki consists of one structured page for each entry in the the Protein Data Bank (PDB) and allows the user to attach categorized comments to the entries. Additionally, each page includes a user editable list of cross-references to external resources. As in a database, it is possible to produce tabular reports and ‘structure galleries’ based on user-defined queries or lists of entries. PDBWiki runs in parallel to the PDB, separating original database content from user annotations. PDBWiki demonstrates how collaboration features can be integrated with primary data from a biological database. It can be used as a system for better understanding how to capture community knowledge in the biological sciences. For users of the PDB, PDBWiki provides a bug-tracker, discussion forum and community annotation system. To date, user participation has been modest, but is increasing. The user editable cross-references section has proven popular, with the number of linked resources more than doubling from 17 originally to 39 today. Database URL: http://www.pdbwiki.org

PLOS Computational Biology | 2009

Defining an Essence of Structure Determining Residue Contacts in Proteins

Rajagopal Sathyapriya; Jose M. Duarte; Henning Stehr; Ioannis Filippis; Michael Lappe

The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this “structural essence” has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts—such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed “cone-peeling” that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 Å Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This “structural essence” opens new avenues in the fields of structure prediction, empirical potentials and docking.

BMC Structural Biology | 2013

An analysis of oligomerization interfaces in transmembrane proteins

Jose M. Duarte; Nikhil Biyani; Kumaran Baskaran; Guido Capitani

BackgroundThe amount of transmembrane protein (TM) structures solved to date is now large enough to attempt large scale analyses. In particular, extensive studies of oligomeric interfaces in the transmembrane region are now possible.ResultsWe have compiled the first fully comprehensive set of validated transmembrane protein interfaces in order to study their features and assess what differentiates them from their soluble counterparts.ConclusionsThe general features of TM interfaces do not differ much from those of soluble proteins: they are large, tightly packed and possess many interface core residues. In our set, membrane lipids were not found to significantly mediate protein-protein interfaces. Although no G protein-coupled receptor (GPCR) was included in the validated set, we analyzed the crystallographic dimerization interfaces proposed in the literature. We found that the putative dimer interfaces proposed for class A GPCRs do not show the usual patterns of stable biological interfaces, neither in terms of evolution nor of packing, thus they likely correspond to crystal interfaces. We cannot however rule out the possibility that they constitute transient or weak interfaces. In contrast we do observe a clear signature of biological interface for the proposed dimer of the class F human Smoothened receptor.

Current Opinion in Biotechnology | 2009

Designing evolvable libraries using multi-body potentials

Michael Lappe; Ganesh Bagler; Ioannis Filippis; Henning Stehr; Jose M. Duarte; Rajagopal Sathyapriya

Novel high-throughput technologies for directed evolution enable experimental coverage of an impressive number of sequences. Nevertheless, the success of such experiments hinges on the initial sequence libraries. Here we consider the computational design of smart focused libraries and review insights from experimental strategies and theoretic advances in modelling their energy landscapes. In library design as in structure prediction, the applied energy function is the key. Current knowledge-based potentials have proven more successful than purely physics-based ones. Here we summarize novel approaches that extend the classical pairwise treatment of residue contacts towards adaptive knowledge-based multi-body potentials. We suggest that minimal sets of probabilistic constraints will lead to much more efficient sampling of permissible conformations and sequence space.

Acta Crystallographica Section A | 2017

RCSB PDB: structural biology views for basic and applied research

Christine Zardecki; Stephen K. Burley; Cole Christie; Jose M. Duarte; Zukang Feng; Andreas Prlić; Alexander S. Rose; John D. Westbrook; Jasmine Young

Users can perform simple searches from the top search bar (e.g., ID, name, sequence, ligand) or build complex combinations of search parameters using Advanced Search. Information from DrugBank is integrated with PDB data to facilitate searches for drugs and drug targets. Other classification systems are used to organize PDB structures in hierarchical trees for browsing and searching (e.g., mpstruc, Gene Ontology, Enzyme Classification).

Acta Crystallographica Section A | 2017

Automated evaluation of quaternary structures from protein crystal structures

Jose M. Duarte; Spencer Bliven; Aleix Lafita; Guido Capitani; Stephen K. Burley

Crystallography is the most powerful technique for generating atomic level structures of proteins and other biological macromolecules. However, it does not always yield definitive insights into the quaternary structures of biological macromolecules. In order to provide better tools for determining the most likely quaternary structure in proteins, we have developed the new EPPIC 3 method. It uses evolutionary considerations as the ultimate arbiters of the biological relevance of interfaces and assemblies, thereby offering a complementary approach versus other available methods that rely on thermodynamic considerations. EPPIC 3 extends our previous Evolutionary Protein-Protein Interface Classifier (EPPIC), by going beyond classifying pairwise interfaces. It identifies all possible topologically valid assemblies present in a protein crystal and provides predictions as to likely quaternary structures. Pairwise interface classifications are based on two evolutionary scores and a single geometric score that are in turn combined into a final score. This approach was trained against a large dataset of known biologically relevant and crystal interfaces. Assembly enumeration is achieved by representing the crystal lattice as a periodic graph. Finding valid assemblies is then reduced to the problem of finding subgraphs complying to a set of rules, which guarantees closed assemblies (Point Group symmetries) and isomorphism in the assembly composition and connectivity throughout the crystal. The software is accessible through an easy to use web graphical interface at http://www.eppic-web.org . The graphical interface is designed to aid the crystallographer in interpreting putative quaternary structures using 2D and 3D graphical tools that operate within the browser.

Explore More