Joannis Apostolakis
Ludwig Maximilian University of Munich
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joannis Apostolakis.
BMC Bioinformatics | 2005
Katrin Fundel; Daniel Güttler; Ralf Zimmer; Joannis Apostolakis
BackgroundSignificant parts of biological knowledge are available only as unstructured text in articles of biomedical journals. By automatically identifying gene and gene product (protein) names and mapping these to unique database identifiers, it becomes possible to extract and integrate information from articles and various data sources.We present a simple and efficient approach that identifies gene and protein names in texts and returns database identifiers for matches. It has been evaluated in the recent BioCreAtIvE entity extraction and mention normalization task by an independent jury.MethodsOur approach is based on the use of synonym lists that map the unique database identifiers for each gene/protein to the different synonym names. For yeast and mouse, synonym lists were used as provided by the organizers who generated them from public model organism databases. The synonym list for fly was generated directly from the corresponding organism database. The lists were then extensively curated in largely automated procedure and matched against MEDLINE abstracts by exact text matching. Rule-based and support vector machine-based post filters were designed and applied to improve precision.ResultsOur procedure showed high recall and precision with F-measures of 0.897 for yeast and 0.764/0.773 for mouse in the BioCreAtIvE assessment (Task 1B) and 0.768 for fly in a post-evaluation.ConclusionThe results were close to the best over all submissions. Depending on the synonym properties it can be crucial to consider context and to filter out erroneous matches. This is especially important for fly, which has a very challenging nomenclature for the protein name identification task. Here, the support vector machine-based post filter proved to be very effective.
Journal of Chemical Information and Modeling | 2008
Robert Körner; Joannis Apostolakis
Chemical reactions transform the reactant molecules by deleting existing and forming new bonds. The identification of these so-called reacting bonds is important for studying the reaction mechanism and for applications in metabolomics, e.g. for interpreting substrate labeling experiments. Here, we introduce an approach which suggests the simplest possible reaction center at the heavy atom level, with high accuracy. In contrast to current methods the approach is motivated by a simple theoretical model based on a crude approximation of the reaction energetics, and takes the complete reacting system into account. Finally, it recovers all optimal solutions to the problem while removing all symmetry-related, redundant solutions. We apply the method on the complete KEGG database of biochemical reactions, and compare our approach with previous methods. The resulting reaction centers are represented as imaginary transition states, which are molecule-like representations of reaction mechanisms. We provide the statistics of the calculations on the KEGG database and discuss some examples for the different types of alternative solutions found.
Journal of Chemical Information and Modeling | 2008
Joannis Apostolakis; Oliver Sacher; Robert Körner; Johann Gasteiger
The correct identification of the reacting bonds and atoms is a prerequisite for the analysis of the reaction mechanism. We have recently developed a method based on the Imaginary Transition State Energy Minimization approach for automatically determining the reaction center information and the atom-atom mapping numbers. We test here the accuracy of this ITSE approach by comparing the predictions of the method against more than 1500 manually annotated reactions from BioPath, a comprehensive database of biochemical reactions. The results show high agreement between manually annotated mappings and computational predictions (98.4%), with significant discrepancies in only 24 cases out of 1542 (1.6%). This result validates both the computational prediction and the database, at the same time, as the results of the former agree with expert knowledge and the latter appears largely self-consistent, and consistent with a simple principle. In 10 of the discrepant cases, simple chemical arguments or independent literature studies support the predicted reaction center. In five reaction instances the differences in the automatically and manually annotated mappings are described in detail. Finally, in approximately 200 cases the algorithm finds alternate reaction centers, which need to be studied on a case by case basis, as the exact choice of the alternative may depend on the enzyme catalyzing the reaction.
Journal of Chemical Information and Modeling | 2007
Jörn Marialke; Robert Körner; Simon Tietze; Joannis Apostolakis
We describe a combined 2D/3D approach for the superposition of flexible chemical structures, which is based on recent progress in the efficient identification of common subgraphs and a gradient-based torsion space optimization algorithm. The simplicity of the approach is reflected in its generality and computational efficiency: the suggested approach neither requires precalculated statistics on the conformations of the molecules nor does it make simplifying assumptions on the topology of the molecules being compared. Furthermore, graph-based molecular alignment produces alignments that are consistent with the chemistry of the molecules as well as their general structure, as it depends on both the local connectivities between atoms and the overall topology of the molecules. We validate this approach on benchmark sets taken from the literature and show that it leads to good results compared to computationally and algorithmically more involved methods. The results suggest that, for most practical purposes, graph-based molecular alignment is a viable alternative to molecular field alignment with respect to structural superposition and leads to structures of comparable quality in a fraction of the time.
Journal of Chemical Information and Modeling | 2006
Andreas Kämper; Joannis Apostolakis; Matthias Rarey; Christel M. Marian; Thomas Lengauer
The prediction of the structure of host-guest complexes is one of the most challenging problems in supramolecular chemistry. Usual procedures for docking of ligands into receptors do not take full conformational freedom of the host molecule into account. We describe and apply a new docking approach which performs a conformational sampling of the host and then sequentially docks the ligand into all receptor conformers using the incremental construction technique of the FlexX software platform. The applicability of this approach is validated on a set of host-guest complexes with known crystal structure. Moreover, we demonstrate that due to the interchangeability of the roles of host and guest, the docking process can be inverted. In this inverse docking mode, the receptor molecule is docked around its ligand. For all investigated test cases, the predicted structures are in good agreement with the experiment for both normal (forward) and inverse docking. Since the ligand is often smaller than the receptor and, thus, its conformational space is more restricted, the inverse docking approach leads in most cases to considerable speed-up. By having the choice between two alternative docking directions, the application range of the method is significantly extended. Finally, an important result of this study is the suitability of the simple energy function used here for structure prediction of complexes in organic media.
Acta Crystallographica Section A | 2001
Joannis Apostolakis; D.W.M. Hofmann; Thomas Lengauer
The ever increasing number of experimentally resolved crystal structures supports the possibility of fully empirical crystal structure prediction for small organic molecules. Empirical methods promise to be significantly more efficient than methods that attempt to solve the same problem from first principles. However, the transformation from data to empirical knowledge and further to functional algorithms is not trivial and the usefulness of the result depends strongly on the quantity and the quality of the data. In this work, a simple scoring function is parameterized to discriminate between the correct structure and a set of decoys for a large number of different molecular systems. The method is fully automatic and has the advantage that the complete scoring function is parametrized at once, leading to a self-consistent set of parameters. The obtained scoring function is tested on an independent set of crystal structures taken from the P1 and P1; space groups. With the trained scoring function and FlexCryst, a program for small-molecule crystal structure prediction, it is shown that approximately 73% of the 239 tested molecules in space group P1 are predicted correctly. For the more complex space group P1;, the success rate is 26%. Comparison with force-field potentials indicates the physical content of the obtained scoring function, a result of direct importance for protein threading where such database-based potentials are being applied.
Chemistry Central Journal | 2007
Andreas Steffen; Joannis Apostolakis
BackgroundIn this study we investigated the predictability of three thermodynamic quantities related to complex formation. As a model system we chose the host-guest complexes of β-cyclodextrin (β-CD) with different guest molecules. A training dataset comprised of 176 β-CD guest molecules with experimentally determined thermodynamic quantities was taken from the literature. We compared the performance of three different statistical regression methods – principal component regression (PCR), partial least squares regression (PLSR), and support vector machine regression combined with forward feature selection (SVMR/FSS) – with respect to their ability to generate predictive quantitative structure property relationship (QSPR) models for ΔG°, ΔH° and ΔS° on the basis of computed molecular descriptors.ResultsWe found that SVMR/FFS marginally outperforms PLSR and PCR in the prediction of ΔG°, with PLSR performing slightly better than PCR. PLSR and PCR proved to be more stable in a nested cross-validation protocol. Whereas ΔG° can be predicted in good agreement with experimental values, none of the methods led to comparably good predictive models for ΔH°. In using the methods outlined in this study, we found that ΔS° appears almost unpredictable. In order to understand the differences in the ease of predicting the quantities, we performed a detailed analysis. As a result we can show that free energies are less sensitive (than enthalpy or entropy) to the small structural variations of guest molecules. This property, as well as the lower sensitivity of ΔG° to experimental conditions, are possible explanations for its greater predictability.ConclusionThis study shows that the ease of predicting ΔG° cannot be explained by the predictability of either ΔH° or ΔS°. Our analysis suggests that the poor predictability of TΔS° and, to a lesser extent, ΔH° has to do with a stronger dependence of these quantities on the structural details of the complex and only to a lesser extent on experimental error.
Archive | 2009
Joannis Apostolakis
Data mining aims at the automated discovery of knowledge from typically large repositories of data. In science this knowledge is most often integrated into a model describing a particular process or natural phenomenon. Requirements with respect to the predictivity and the generality of the resulting models are usually significantly higher than in other application domains. Therefore, in the use of data mining in the sciences, and crystallography in particular, methods from machine learning and statistics play a significantly higher role than in other application areas. In the context of Crystallography, data collection, cleaning, and warehousing are aspects from standard data mining that play an important role, whereas for the analysis of the data techniques from machine learning and statistical analysis are mostly used. The purpose of this chapter is to introduce the reader to the concepts from that latter part of the knowledge discovery process and to provide a general intuition for the methods and possibilities of the different tools for learning from databases.
Journal of Chemical Information and Modeling | 2009
Christos A. Nicolaou; Joannis Apostolakis; Constantinos S. Pattichis
Chemistry & Biology | 2007
Stefan Zahler; Simon Tietze; Frank Totzke; Michael H.G. Kubbutat; Laurent Meijer; Angelika M. Vollmar; Joannis Apostolakis