David Hoksza | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Hoksza is active.

Explore More

Publication

Featured researches published by David Hoksza.

Journal of Cheminformatics | 2014

Molpher: a software framework for systematic chemical space exploration

David Hoksza; Petr Škoda; Milan Voršilák; Daniel Svozil

BackgroundChemical space is virtual space occupied by all chemically meaningful organic compounds. It is an important concept in contemporary chemoinformatics research, and its systematic exploration is vital to the discovery of either novel drugs or new tools for chemical biology.ResultsIn this paper, we describe Molpher, an open-source framework for the systematic exploration of chemical space. Through a process we term ‘molecular morphing’, Molpher produces a path of structurally-related compounds. This path is generated by the iterative application of so-called ‘morphing operators’ that represent simple structural changes, such as the addition or removal of an atom or a bond. Molpher incorporates an optimized parallel exploration algorithm, compound logging and a two-dimensional visualization of the exploration process. Its feature set can be easily extended by implementing additional morphing operators, chemical fingerprints, similarity measures and visualization methods. Molpher not only offers an intuitive graphical user interface, but also can be run in batch mode. This enables users to easily incorporate molecular morphing into their existing drug discovery pipelines.ConclusionsMolpher is an open-source software framework for the design of virtual chemical libraries focused on a particular mechanistic class of compounds. These libraries, represented by a morphing path and its surroundings, provide valuable starting data for future in silico and in vitro experiments. Molpher is highly extensible and can be easily incorporated into any existing computational drug design pipeline.

Bioinformatics | 2012

Efficient RNA pairwise structure comparison by SETTER method

David Hoksza; Daniel Svozil

MOTIVATION Understanding the architecture and function of RNA molecules requires methods for comparing and analyzing their 3D structures. Although a structural alignment of short RNAs is achievable in a reasonable amount of time, large structures represent much bigger challenge. However, the growth of the number of large RNAs deposited in the PDB database calls for the development of fast and accurate methods for analyzing their structures, as well as for rapid similarity searches in databases. RESULTS In this article a novel algorithm for an RNA structural comparison SETTER (SEcondary sTructure-based TERtiary Structure Similarity Algorithm) is introduced. SETTER uses a pairwise comparison method based on 3D similarity of the so-called generalized secondary structure units. For each pair of structures, SETTER produces a distance score and an indication of its statistical significance. SETTER can be used both for the structural alignments of structures that are already known to be homologous, as well as for 3D structure similarity searches and functional annotation. The algorithm presented is both accurate and fast and does not impose limits on the size of aligned RNA structures. AVAILABILITY The SETTER program, as well as all datasets, is freely available from http://siret.cz/hoksza/projects/setter/.

Nucleic Acids Research | 2012

SETTER: web server for RNA structure comparison

Petr Čech; Daniel Svozil; David Hoksza

The recent discoveries of regulatory non-coding RNAs changed our view of RNA as a simple information transfer molecule. Understanding the architecture and function of active RNA molecules requires methods for comparing and analyzing their 3D structures. While structural alignment of short RNAs is achievable in a reasonable amount of time, large structures represent much bigger challenge. Here, we present the SETTER web server for the RNA structure pairwise comparison utilizing the SETTER (SEcondary sTructure-based TERtiary Structure Similarity Algorithm) algorithm. The SETTER method divides an RNA structure into the set of non-overlapping structural elements called generalized secondary structure units (GSSUs). The SETTER algorithm scales as O(n2) with the size of a GSSUs and as O(n) with the number of GSSUs in the structure. This scaling gives SETTER its high speed as the average size of the GSSU remains constant irrespective of the size of the structure. However, the favorable speed of the algorithm does not compromise its accuracy. The SETTER web server together with the stand-alone implementation of the SETTER algorithm are freely accessible at http://siret.cz/setter.

Journal of Cheminformatics | 2015

Improving protein-ligand binding site prediction accuracy by classification of inner pocket points using local features

Radoslav Krivák; David Hoksza

BackgroundProtein-ligand binding site prediction from a 3D protein structure plays a pivotal role in rational drug design and can be helpful in drug side-effects prediction or elucidation of protein function. Embedded within the binding site detection problem is the problem of pocket ranking – how to score and sort candidate pockets so that the best scored predictions correspond to true ligand binding sites. Although there exist multiple pocket detection algorithms, they mostly employ a fairly simple ranking function leading to sub-optimal prediction results.ResultsWe have developed a new pocket scoring approach (named PRANK) that prioritizes putative pockets according to their probability to bind a ligand. The method first carefully selects pocket points and labels them by physico-chemical characteristics of their local neighborhood. Random Forests classifier is subsequently applied to assign a ligandability score to each of the selected pocket point. The ligandability scores are finally merged into the resulting pocket score to be used for prioritization of the putative pockets. With the used of multiple datasets the experimental results demonstrate that the application of our method as a post-processing step greatly increases the quality of the prediction of Fpocket and ConCavity, two state of the art protein-ligand binding site prediction algorithms.ConclusionsThe positive experimental results show that our method can be used to improve the success rate, validity and applicability of existing protein-ligand binding site prediction tools. The method was implemented as a stand-alone program that currently contains support for Fpocket and Concavity out of the box, but is easily extendible to support other tools. PRANK is made freely available at http://siret.ms.mff.cuni.cz/prank.

Proteome Science | 2011

SProt: sphere-based protein structure similarity algorithm

Jakub Galgonek; David Hoksza; Tomáš Skopal

BackgroundSimilarity search in protein databases is one of the most essential issues in computational proteomics. With the growing number of experimentally resolved protein structures, the focus shifted from sequences to structures. The area of structure similarity forms a big challenge since even no standard definition of optimal structure similarity exists in the field.ResultsWe propose a protein structure similarity measure called SProt. SProt concentrates on high-quality modeling of local similarity in the process of feature extraction. SProt’s features are based on spherical spatial neighborhood of amino acids where similarity can be well-defined. On top of the partial local similarities, global measure assessing similarity to a pair of protein structures is built. Finally, indexing is applied making the search process by an order of magnitude faster.ConclusionsThe proposed method outperforms other methods in classification accuracy on SCOP superfamily and fold level, while it is at least comparable to the best existing solutions in terms of precision-recall or quality of alignment.

database and expert systems applications | 2015

Using Neo4j for Mining Protein Graphs: A Case Study

David Hoksza; Jan Jelínek

Using graph databases becomes increasingly popular in domains where data can be modeled as a set of connected objects. Graph databases enable to query such data using graph-based queries in a relatively simple manner in comparison to the classical relational databases. In this paper, we show how one of the most popular graph databases, Neo4j, can be applied to the bioinformatics problem of protein-protein interface (PPI) identification. The goal of the PPI identification task is, given a protein structure, to identify amino acids which are responsible for binding of the structure to other proteins. Each protein structure consists of a set of amino acid molecules which can be conceived as a graph and multitude of methods for analysis of such protein graphs have been established. We introduce here a knowledge-based approach which can enhance the quality of these methods by utilizing existing protein structure knowledge stored in the Protein Data Bank (PDB). We show how to transform information about protein complexes from PDB into Neo4j where they can be stored as a set of independent protein graphs. The resulting graph database contains about 14 millions labeled nodes and 38 millions edges. In the PPI identification phase, this database is queried using exact subgraph matching and the results are aggregated to improve an existing PPI identification method. We show the pros and cons of using Neo4j for such endeavor with respect to the size of the database and complexity of the queries in comparison to using a relational database (Microsoft SQL Server). We conclude that using Neo4j is a viable option for specific, rather small, subgraph query types. However, we have encountered performance limitations, especially for larger query graphs in terms of number of edges.

BMC Bioinformatics | 2015

MultiSETTER: web server for multiple RNA structure comparison

Petr Čech; David Hoksza; Daniel Svozil

BackgroundUnderstanding the architecture and function of RNA molecules requires methods for comparing and analyzing their tertiary and quaternary structures. While structural superposition of short RNAs is achievable in a reasonable time, large structures represent much bigger challenge. Therefore, we have developed a fast and accurate algorithm for RNA pairwise structure superposition called SETTER and implemented it in the SETTER web server. However, though biological relationships can be inferred by a pairwise structure alignment, key features preserved by evolution can be identified only from a multiple structure alignment. Thus, we extended the SETTER algorithm to the alignment of multiple RNA structures and developed the MultiSETTER algorithm.ResultsIn this paper, we present the updated version of the SETTER web server that implements a user friendly interface to the MultiSETTER algorithm. The server accepts RNA structures either as the list of PDB IDs or as user-defined PDB files. After the superposition is computed, structures are visualized in 3D and several reports and statistics are generated.ConclusionTo the best of our knowledge, the MultiSETTER web server is the first publicly available tool for a multiple RNA structure alignment. The MultiSETTER server offers the visual inspection of an alignment in 3D space which may reveal structural and functional relationships not captured by other multiple alignment methods based either on a sequence or on secondary structure motifs.

computational intelligence in bioinformatics and computational biology | 2009

DDPIn - Distance and density based protein indexing

David Hoksza

Protein structure similarity and classification methods have many applications in protein function prediction and associated fields (e.g. drug discovery). In this paper, we propose a new protein structure representation method enabling fast and accurate classification. In our approach, each protein structure is represented by number of vectors (based on histogram of distances) equivalent to the number of its Cα residues. Each Cα residue represents a viewpoint from which the distances to each of the other residues are computed. Consequently, we use several methods to convert these distances into a n-dimensional feature vector which is indexed using a metric indexing structure (M-tree is the structure of our choice). While searching, we use single or multi-step approach which provides us with classification accuracy and speed comparable to the best contemporary classification methods.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2015

Multiple 3D RNA structure superposition using neighbor joining

David Hoksza; Daniel Svozil

Recent advances in RNA research and the steady growth of available RNA structures call for bioinformatics methods for handling and analyzing RNA structural data. Recently, we introduced SETTER-a fast and accurate method for RNA pairwise structure alignment. In this paper, we describe MultiSETTER, SETTER extension for multiple RNA structure alignment. MultiSETTER combines SETTERs decomposition of RNA structures into non-overlapping structural subunits with the multiple sequence alignment algorithm ClustalW adapted for the structure alignment. The accuracy of MultiSETTER was assessed by the automatic classification of RNA structures and its comparison to SCOR annotations. In addition, MultiSETTER classification was also compared to multiple sequence alignment-based and secondary structure alignment-based classifications provided by LocARNA and RNADistance tools, respectively. MultiSETTER precompiled Windows libraries, as well as the C++ source code, are freely available from http://siret.cz/multisetter.

computational intelligence in bioinformatics and computational biology | 2009

An application of the metric access methods to the mass spectrometry data

Jiri Novak; David Hoksza

Mass spectrometry is a very popular method for protein and peptide identification nowadays. Abundance of data generated in this way grows exponentially every year. Although there exist algorithms for interpreting mass spectra, demand for faster and more accurate approaches remains.

Explore More