Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Witold Dyrka is active.

Publication


Featured researches published by Witold Dyrka.


BMC Bioinformatics | 2009

A stochastic context free grammar based framework for analysis of protein sequences

Witold Dyrka; Jean-Christophe Nebel

BackgroundIn the last decade, there have been many applications of formal language theory in bioinformatics such as RNA structure prediction and detection of patterns in DNA. However, in the field of proteomics, the size of the protein alphabet and the complexity of relationship between amino acids have mainly limited the application of formal language theory to the production of grammars whose expressive power is not higher than stochastic regular grammars. However, these grammars, like other state of the art methods, cannot cover any higher-order dependencies such as nested and crossing relationships that are common in proteins. In order to overcome some of these limitations, we propose a Stochastic Context Free Grammar based framework for the analysis of protein sequences where grammars are induced using a genetic algorithm.ResultsThis framework was implemented in a system aiming at the production of binding site descriptors. These descriptors not only allow detection of protein regions that are involved in these sites, but also provide insight in their structure. Grammars were induced using quantitative properties of amino acids to deal with the size of the protein alphabet. Moreover, we imposed some structural constraints on grammars to reduce the extent of the rule search space. Finally, grammars based on different properties were combined to convey as much information as possible. Evaluation was performed on sites of various sizes and complexity described either by PROSITE patterns, domain profiles or a set of patterns. Results show the produced binding site descriptors are human-readable and, hence, highlight biologically meaningful features. Moreover, they achieve good accuracy in both annotation and detection. In addition, findings suggest that, unlike current state-of-the-art methods, our system may be particularly suited to deal with patterns shared by non-homologous proteins.ConclusionA new Stochastic Context Free Grammar based framework has been introduced allowing the production of binding site descriptors for analysis of protein sequences. Experiments have shown that not only is this new approach valid, but produces human-readable descriptors for binding sites which have been beyond the capability of current machine learning techniques.


Journal of Computational Chemistry | 2008

Ion flux through membrane channels—An enhanced algorithm for the Poisson-Nernst-Planck model

Witold Dyrka; Andy T. Augousti; Malgorzata Kotulska

A novel algorithmic scheme for numerical solution of the 3D Poisson‐Nernst‐Planck model is proposed. The algorithmic improvements are universal and independent of the detailed physical model. They include three major steps: an adjustable gradient‐based step value, an adjustable relaxation coefficient, and an optimized segmentation of the modeled space. The enhanced algorithm significantly accelerates the speed of computation and reduces the computational demands. The theoretical model was tested on a regular artificial channel and validated on a real protein channel—α‐hemolysin, proving its efficiency.


Proteins | 2013

Optimization of 3D Poisson–Nernst-Planck model for fast evaluation of diverse protein channels

Witold Dyrka; Maciej M. Bartuzel; Malgorzata Kotulska

We show the accuracy and applicability of our fast algorithmic implementation of a three‐dimensional Poisson–Nernst–Planck (3D‐PNP) flow model for characterizing different protein channels. Due to its high computational efficiency, our model can predict the full current‐voltage characteristics of a channel within minutes, based on the experimental 3D structure of the channel or its computational model structure. Compared with other methods, such as Brownian dynamics, which currently needs a few weeks of the computational time, or even much more demanding molecular dynamics modeling, 3D‐PNP is the only available method for a function‐based evaluation of very numerous tentative structural channel models. Flow model tests of our algorithm and its optimal parametrization are provided for five native channels whose experimental structures are available in the protein data bank (PDB) in an open conductive state, and whose experimental current‐voltage characteristics have been published. The channels represent very different geometric and structural properties, which makes it the widest test to date of the accuracy of 3D‐PNP on real channels. We test whether the channel conductance, rectification, and charge selectivity obtained from the flow model, could be sufficiently sensitive to single‐point mutations, related to unsignificant changes in the channel structure. Our results show that the classical 3D‐PNP model, under proper parametrization, is able to achieve a qualitative agreement with experimental data for a majority of the tested characteristics and channels, including channels with narrow and irregular conductivity pores. We propose that although the standard PNP model cannot provide insight into complex physical phenomena due to its intrinsic limitations, its semiquantitative agreement is achievable for rectification and selectivity at a level sufficient for the bioinformatical purpose of selecting the best structural models with a great advantage of a very short computational time. Proteins 2013; 81:1802–1822.


ICCCI (SCI Volume) | 2009

Accuracy in Predicting Secondary Structure of Ionic Channels

Bogumil M. Konopka; Witold Dyrka; Jean-Christophe Nebel; Malgorzata Kotulska

Ionic channels are among the most difficult proteins for experimental structure determining, very few of them has been resolved. Bioinformatical tools has not been tested for this specific protein group. In the paper, prediction quality of ionic channel secondary structure is evaluated. The tests were carried out with general protein predictors and predictors only for transmembrane segments. The predictor performance was measured by the accuracy per residue Q and per segment SOV. The evaluation comparing ionic channels and other transmembrane proteins shows that ionic channels are only slightly more difficult objects for modeling than transmembrane proteins; the modeling quality is comparable with a general set of all proteins. Prediction quality showed dependence on the ratio of secondary structures in the ionic channel. Surprisingly, general purpose PSIPRED predictor outperformed other general but also dedicated transmembrane predictors under evaluation.


BMC Systems Biology | 2007

A probabilistic context-free grammar for the detection of binding sites from a protein sequence

Witold Dyrka; Jean-Christophe Nebel

Introduction The analysis of a protein, through the evaluation of interactions between the amino acid composing its sequence, is a very challenging problem where pattern recognition techniques based on Hidden Markov Model (HMM) have proved to be the most efficient [1]. Although HMM is a powerful technique, it has limitations. According to formal language theory, its expressive power is similar to probabilistic regular grammars. A more powerful grammar, Context-Free Grammar (CFG), has been applied successfully for the recognition and prediction of RNA structure [1,2]. However, its utilisation in the field of protein pattern recognition is a more challenging task due to the larger set of terminals and less straightforward relations between residues. In this piece of work, we propose a Probabilistic Context-Free Grammar (PCFG) to represent features of protein structures. In order to deal with the size of the protein alphabet, we use quantitative properties of amino acids to reduce the number of symbols. Based on that grammar we designed a tool allowing the detection of protein regions which are involved in binding sites. The PCFG is evolved using a genetic algorithm (GA) to describe a pattern shared by a set of proteins.


Proteins | 2016

Fast assessment of structural models of ion channels based on their predicted current-voltage characteristics.

Witold Dyrka; Monika Kurczynska; Bogumil M. Konopka; Malgorzata Kotulska

Computational prediction of protein structures is a difficult task, which involves fast and accurate evaluation of candidate model structures. We propose to enhance single‐model quality assessment with a functionality evaluation phase for proteins whose quantitative functional characteristics are known. In particular, this idea can be applied to evaluation of structural models of ion channels, whose main function ‐ conducting ions ‐ can be quantitatively measured with the patch‐clamp technique providing the current–voltage characteristics. The study was performed on a set of KcsA channel models obtained from complete and incomplete contact maps. A fast continuous electrodiffusion model was used for calculating the current–voltage characteristics of structural models. We found that the computed charge selectivity and total current were sensitive to structural and electrostatic quality of models. In practical terms, we show that evaluating predicted conductance values is an appropriate method to eliminate models with an occluded pore or with multiple erroneously created pores. Moreover, filtering models on the basis of their predicted charge selectivity results in a substantial enrichment of the candidate set in highly accurate models. Tests on three other ion channels indicate that, in addition to being a proof of the concept, our function‐oriented single‐model quality assessment method can be directly applied to evaluation of structural models of some classes of protein channels. Finally, our work raises an important question whether a computational validation of functionality should be included in the evaluation process of structural models, whenever possible. Proteins 2016; 84:217–231.


Algorithms for Molecular Biology | 2013

Probabilistic grammatical model for helix‐helix contact site classification

Witold Dyrka; Jean-Christophe Nebel; Malgorzata Kotulska

BackgroundHidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited.ResultsIn this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites.ConclusionsWe demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists.


pattern recognition in bioinformatics | 2010

Towards 3D modeling of interacting TM helix pairs based on classification of helix pair sequence

Witold Dyrka; Jean-Christophe Nebel; Malgorzata Kotulska

Spatial structures of transmembrane proteins are difficult to obtain either experimentally or by computational methods. Recognition of helix-helix contacts conformations, which provide structural skeleton of many transmembrane proteins, is essential in the modeling. Majority of helix-helix interactions in transmembrane proteins can be accurately clustered into a few classes on the basis of their 3D shape. We propose a Stochastic Context Free Grammars framework, combined with evolutionary algorithm, to represent sequence level features of these classes. The descriptors were tested using independent test sets and typically achieved the areas under ROC curves 0.60-0.70; some reached 0.77.


BMC Bioinformatics | 2017

Quantiprot - a Python package for quantitative analysis of protein sequences

Bogumil M. Konopka; Marta Marciniak; Witold Dyrka

BackgroundThe field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted.ResultsQuantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf’s law coefficient.ConclusionsWe propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.


Archive | 2013

Probabilistic grammatical model of protein language and its application to helix-helix contact site classification

Witold Dyrka; Jean-Christophe Nebel; Malgorzata Kotulska

Collaboration


Dive into the Witold Dyrka's collaboration.

Top Co-Authors

Avatar

Malgorzata Kotulska

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bogumil M. Konopka

Wrocław University of Technology

View shared research outputs
Top Co-Authors

Avatar

Monika Kurczynska

Wrocław University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Maciej M. Bartuzel

University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge