Thomas Hamelryck
University of Copenhagen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas Hamelryck.
Bioinformatics | 2009
Peter J. A. Cock; Tiago Antao; Jeffrey T. Chang; Brad Chapman; Cymon J. Cox; Andrew Dalke; Iddo Friedberg; Thomas Hamelryck; Frank Kauff; Bartosz Wilczyński; Michiel J. L. de Hoon
Summary: The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Availability: Biopython is freely available, with documentation and source code at www.biopython.org under the Biopython license. Contact: All queries should be directed to the Biopython mailing lists, see www.biopython.org/wiki/[email protected].
Bioinformatics | 2003
Thomas Hamelryck; Bernard Manderick
UNLABELLED The biopython project provides a set of bioinformatics tools implemented in Python. Recently, biopython was extended with a set of modules that deal with macromolecular structure. Biopython now contains a parser for PDB files that makes the atomic information available in an easy-to-use but powerful data structure. The parser and data structure deal with features that are often left out or handled inadequately by other packages, e.g. atom and residue disorder (if point mutants are present in the crystal), anisotropic B factors, multiple models and insertion codes. In addition, the parser performs some sanity checking to detect obvious errors. AVAILABILITY The Biopython distribution (including source code and documentation) is freely available (under the Biopython license) from http://www.biopython.org
Proceedings of the National Academy of Sciences of the United States of America | 2008
Wouter Boomsma; Kanti V. Mardia; Charles C. Taylor; Jesper Ferkinghoff-Borg; Anders Krogh; Thomas Hamelryck
Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence–structure correlations in the native state. Our method represents a significant theoretical and practical improvement over the widely used fragment assembly technique by avoiding the drawbacks associated with a discrete and nonprobabilistic approach.
Proteins | 2005
Thomas Hamelryck
The concept of amino acid solvent exposure is crucial for understanding and predicting various aspects of protein structure and function. The traditional measures of solvent exposure however suffer from various shortcomings, like for example the inability to distinguish exposed, partly exposed, buried, and deeply buried residues. This article introduces a new measure of solvent exposure called Half‐Sphere Exposure that addresses many of the shortcomings of other methods. The new measure outperforms other measures with respect to correlation with protein stability, conservation among fold homologs, amino acid‐type dependency and interpretation. The measure consists of the number of Cα atoms in two half spheres around a residues Cα atom. Conceptually, one of the half spheres corresponds to the side chains neighborhood, the other half sphere being in the opposite direction. We show here that the two half spheres correspond to two regions around an amino acid that are surprisingly distinct in terms of geometry and energy. This aspect of protein structure introduced here forms the basis of the Half‐Sphere Exposure measure. The results strongly suggest that in many respects, a 2D measure is inherently much better suited to describe solvent exposure than the traditional 1D measures. Importantly, Half‐Sphere Exposure can be calculated from the Cα atom coordinates only, which abolishes the need for a full‐atom model to calculate solvent exposure. Hence, the measure can be used in protein structure prediction methods that are based on various simplified models. Half‐Sphere Exposure has great potential for use in protein structure prediction and analysis. Proteins 2005.
PLOS Computational Biology | 2005
Thomas Hamelryck; John T. Kent; Anders Krogh
The prediction of protein structure from sequence remains a major unsolved problem in biology. The most successful protein structure prediction methods make use of a divide-and-conquer strategy to attack the problem: a conformational sampling method generates plausible candidate structures, which are subsequently accepted or rejected using an energy function. Conceptually, this often corresponds to separating local structural bias from the long-range interactions that stabilize the compact, native state. However, sampling protein conformations that are compatible with the local structural bias encoded in a given protein sequence is a long-standing open problem, especially in continuous space. We describe an elegant and mathematically rigorous method to do this, and show that it readily generates native-like protein conformations simply by enforcing compactness. Our results have far-reaching implications for protein structure prediction, determination, simulation, and design.
PLOS Computational Biology | 2009
Jes Frellsen; Ida Moltke; Martin Thiim; Kanti V. Mardia; Jesper Ferkinghoff-Borg; Thomas Hamelryck
The increasing importance of non-coding RNA in biology and medicine has led to a growing interest in the problem of RNA 3-D structure prediction. As is the case for proteins, RNA 3-D structure prediction methods require two key ingredients: an accurate energy function and a conformational sampling procedure. Both are only partly solved problems. Here, we focus on the problem of conformational sampling. The current state of the art solution is based on fragment assembly methods, which construct plausible conformations by stringing together short fragments obtained from experimental structures. However, the discrete nature of the fragments necessitates the use of carefully tuned, unphysical energy functions, and their non-probabilistic nature impairs unbiased sampling. We offer a solution to the sampling problem that removes these important limitations: a probabilistic model of RNA structure that allows efficient sampling of RNA conformations in continuous space, and with associated probabilities. We show that the model captures several key features of RNA structure, such as its rotameric nature and the distribution of the helix lengths. Furthermore, the model readily generates native-like 3-D conformations for 9 out of 10 test structures, solely using coarse-grained base-pairing information. In conclusion, the method provides a theoretical and practical solution for a major bottleneck on the way to routine prediction and simulation of RNA structure and dynamics in atomic detail.
PLOS ONE | 2010
Thomas Hamelryck; Mikael Borg; Martin Paluszewski; Jonas Paulsen; Jes Frellsen; Christian Andreetta; Wouter Boomsma; Sandro Bottaro; Jesper Ferkinghoff-Borg
Understanding protein structure is of crucial importance in science, medicine and biotechnology. For about two decades, knowledge-based potentials based on pairwise distances – so-called “potentials of mean force” (PMFs) – have been center stage in the prediction and design of protein structure and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state – a necessary component of these potentials – is an unsolved problem. PMFs are loosely justified by analogy to the reversible work theorem in statistical physics, or by a statistical argument based on a likelihood function. Both justifications are insightful but leave many questions unanswered. Here, we show for the first time that PMFs can be seen as approximations to quantities that do have a rigorous probabilistic justification: they naturally arise when probability distributions over different features of proteins need to be combined. We call these quantities “reference ratio distributions” deriving from the application of the “reference ratio method.” This new view is not only of theoretical relevance but leads to many insights that are of direct practical use: the reference state is uniquely defined and does not require external physical insights; the approach can be generalized beyond pairwise distances to arbitrary features of protein structure; and it becomes clear for which purposes the use of these quantities is justified. We illustrate these insights with two applications, involving the radius of gyration and hydrogen bonding. In the latter case, we also show how the reference ratio method can be iteratively applied to sculpt an energy funnel. Our results considerably increase the understanding and scope of energy functions derived from known biomolecular structures.
BMC Bioinformatics | 2007
Kyoung-Jae Won; Thomas Hamelryck; Adam Prügel-Bennett; Anders Krogh
BackgroundThe prediction of the secondary structure of proteins is one of the most studied problems in bioinformatics. Despite their success in many problems of biological sequence analysis, Hidden Markov Models (HMMs) have not been used much for this problem, as the complexity of the task makes manual design of HMMs difficult. Therefore, we have developed a method for evolving the structure of HMMs automatically, using Genetic Algorithms (GAs).ResultsIn the GA procedure, populations of HMMs are assembled from biologically meaningful building blocks. Mutation and crossover operators were designed to explore the space of such Block-HMMs. After each step of the GA, the standard HMM estimation algorithm (the Baum-Welch algorithm) was used to update model parameters. The final HMM captures several features of protein sequence and structure, with its own HMM grammar. In contrast to neural network based predictors, the evolved HMM also calculates the probabilities associated with the predictions. We carefully examined the performance of the HMM based predictor, both under the multiple- and single-sequence condition.ConclusionWe have shown that the proposed evolutionary method can automatically design the topology of HMMs. The method reads the grammar of protein sequences and converts it into the grammar of an HMM. It improved previously suggested evolutionary methods and increased the prediction quality. Especially, it shows good performance under the single-sequence condition and provides probabilistic information on the prediction result. The protein secondary structure predictor using HMMs (P.S.HMM) is on-line available http://www.binf.ku.dk/~won/pshmm.htm. It runs under the single-sequence condition.
BMC Bioinformatics | 2010
Tim Harder; Wouter Boomsma; Martin Paluszewski; Jes Frellsen; Kristoffer E. Johansson; Thomas Hamelryck
BackgroundAccurately covering the conformational space of amino acid side chains is essential for important applications such as protein design, docking and high resolution structure prediction. Today, the most common way to capture this conformational space is through rotamer libraries - discrete collections of side chain conformations derived from experimentally determined protein structures. The discretization can be exploited to efficiently search the conformational space. However, discretizing this naturally continuous space comes at the cost of losing detailed information that is crucial for certain applications. For example, rigorously combining rotamers with physical force fields is associated with numerous problems.ResultsIn this work we present BASILISK: a generative, probabilistic model of the conformational space of side chains that makes it possible to sample in continuous space. In addition, sampling can be conditional upon the proteins detailed backbone conformation, again in continuous space - without involving discretization.ConclusionsA careful analysis of the model and a comparison with various rotamer libraries indicates that the model forms an excellent, fully continuous model of side chain conformational space. We also illustrate how the model can be used for rigorous, unbiased sampling with a physical force field, and how it improves side chain prediction when used as a pseudo-energy term. In conclusion, BASILISK is an important step forward on the way to a rigorous probabilistic description of protein structure in continuous space and in atomic detail.
Journal of Chemical Theory and Computation | 2014
Simon Olsson; Beat Vögeli; Andrea Cavalli; Wouter Boomsma; Jesper Ferkinghoff-Borg; Kresten Lindorff-Larsen; Thomas Hamelryck
The motions of biological macromolecules are tightly coupled to their functions. However, while the study of fast motions has become increasingly feasible in recent years, the study of slower, biologically important motions remains difficult. Here, we present a method to construct native state ensembles of proteins by the combination of physical force fields and experimental data through modern statistical methodology. As an example, we use NMR residual dipolar couplings to determine a native state ensemble of the extensively studied third immunoglobulin binding domain of protein G (GB3). The ensemble accurately describes both local and nonlocal backbone fluctuations as judged by its reproduction of complementary experimental data. While it is difficult to assess precise time-scales of the observed motions, our results suggest that it is possible to construct realistic conformational ensembles of biomolecules very efficiently. The approach may allow for a dramatic reduction in the computational as well as experimental resources needed to obtain accurate conformational ensembles of biological macromolecules in a statistically sound manner.