Ondřej Kuželka
Czech Technical University in Prague
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ondřej Kuželka.
Machine Learning | 2011
Ondřej Kuželka; Filip Železný
We describe an algorithm for constructing a set of tree-like conjunctive relational features by combining smaller conjunctive blocks. Unlike traditional level-wise approaches which preserve the monotonicity of frequency, our block-wise approach preserves monotonicity of feature reducibility and redundancy, which are important in propositionalization employed in the context of classification learning. With pruning based on these properties, our block-wise approach efficiently scales to features including tens of first-order atoms, far beyond the reach of state-of-the art propositionalization or inductive logic programming systems.
BMC Bioinformatics | 2012
Andrea Szabóová; Ondřej Kuželka; Filip Železný; Jakub Tolar
We contribute a novel, ball-histogram approach to DNA-binding propensity prediction of proteins. Unlike state-of-the-art methods based on constructing an ad-hoc set of features describing physicochemical properties of the proteins, the ball-histogram technique enables a systematic, Monte-Carlo exploration of the spatial distribution of amino acids complying with automatically selected properties. This exploration yields a model for the prediction of DNA binding propensity. We validate our method in prediction experiments, improving on state-of-the-art accuracies. Moreover, our method also provides interpretable features involving spatial distributions of selected amino acids.
international conference on machine learning | 2009
Ondřej Kuželka; Filip Železný
We describe an algorithm for constructing a set of acyclic conjunctive relational features by combining smaller conjunctive blocks. Unlike traditional level-wise approaches which preserve the monotonicity of frequency, our block-wise approach preserves a form of monotonicity of the irreducibility and relevancy feature properties, which are important in propositionalization employed in the context of classification learning. With pruning based on these properties, our block-wise approach efficiently scales to features including tens of first-order literals, far beyond the reach of state-of-the art propositionalization or inductive logic programming systems.
european conference on machine learning | 2011
Ondřej Kuželka; Andrea Szabóová; Matěj Holec; Filip Železný
We describe a statistical relational learning framework called Gaussian Logic capable to work efficiently with combinations of relational and numerical data. The framework assumes that, for a fixed relational structure, the numerical data can be modelled by a multivariate normal distribution. We demonstrate how the Gaussian Logic framework can be applied to predictive classification problems. In experiments, we first show an application of the framework for the prediction of DNAbinding propensity of proteins. Next, we show how the Gaussian Logic framework can be used to find motifs describing highly correlated gene groups in gene-expression data which are then used in a set-level-based classification method.
Proteome Science | 2012
Andrea Szabóová; Ondřej Kuželka; Filip Železný; Jakub Tolar
BackgroundThe process of protein-DNA binding has an essential role in the biological processing of genetic information. We use relational machine learning to predict DNA-binding propensity of proteins from their structures. Automatically discovered structural features are able to capture some characteristic spatial configurations of amino acids in proteins.ResultsPrediction based only on structural relational features already achieves competitive results to existing methods based on physicochemical properties on several protein datasets. Predictive performance is further improved when structural features are combined with physicochemical features. Moreover, the structural features provide some insights not revealed by physicochemical features. Our method is able to detect common spatial substructures. We demonstrate this in experiments with zinc finger proteins.ConclusionsWe introduced a novel approach for DNA-binding propensity prediction using relational machine learning which could potentially be used also for protein function prediction in general.
inductive logic programming | 2012
Ondřej Kuželka; Andrea Szabóová; Filip Železný
We study a generalization of Plotkin’s least general generalization. We introduce a novel concept called bounded least general generalization w.r.t. a set of clauses and show an instance of it for which polynomial-time reduction procedures exist. We demonstrate the practical utility of our approach in experiments on several relational learning datasets.
inductive logic programming | 2010
Ondřej Kuželka; Filip Železný
We study reducibility of examples in several typical inductive logic programming benchmarks. The notion of reducibility that we use is related to theta-reduction, commonly used to reduce hypotheses in ILP. Whereas examples are usually not reducible on their own, they often become implicitly reducible when language for constructing hypotheses is fixed.We show that number of ground facts in a dataset can be almost halved for some real-world molecular datasets. Furthermore, we study the impact this has on a popular ILP system Aleph.
international conference on machine learning | 2008
Ondřej Kuželka; Filip Železný
In inductive logic programming, θ-subsumption is a widely used coverage test. Unfortunately, testing θ-subsumption is NP-complete, which represents a crucial efficiency bottleneck for many relational learners. In this paper, we present a probabilistic estimator of clause coverage, based on a randomized restarted search strategy. Under a distribution assumption, our algorithm can estimate clause coverage without having to decide subsumption for all examples. We implement this algorithm in program ReCovEr. On generated graph data and real-world datasets, we show that ReCovEr provides reasonably accurate estimates while achieving dramatic runtimes improvements compared to a state-of-the-art algorithm.
Journal of Intelligent Information Systems | 2014
Ondřej Kuželka; Andrea Szabóová; Filip Železný
Feature selection methods often improve the performance of attribute-value learning. We explore whether also in relational learning, examples in the form of clauses can be reduced in size to speed up learning without affecting the learned hypothesis. To this end, we introduce the notion of safe reduction: a safely reduced example cannot be distinguished from the original example under the given hypothesis language bias. Next, we consider the particular, rather permissive bias of bounded treewidth clauses. We show that under this hypothesis bias, examples of arbitrary treewidth can be reduced efficiently. We evaluate our approach on four data sets with the popular system Aleph and the state-of-the-art relational learner nFOIL. On all four data sets we make learning faster in the case of nFOIL, achieving an order-of-magnitude speed up on one of the data sets, and more accurate in the case of Aleph.
NFMCP'12 Proceedings of the First international conference on New Frontiers in Mining Complex Patterns | 2012
Ondřej Kuželka; Andrea Szabóová; Filip Železný
Feature selection methods often improve the performance of attribute-value learning. We explore whether also in relational learning, examples in the form of clauses can be reduced in size to speed up learning without affecting the learned hypothesis. To this end, we introduce the notion of safe reduction: a safely reduced example cannot be distinguished from the original example under the given hypothesis language bias. Next, we consider the particular, rather permissive bias of bounded treewidth clauses. We show that under this hypothesis bias, examples of arbitrary treewidth can be reduced efficiently. The bounded treewidth bias can be replaced by other assumptions such as acyclicity with similar benefits. We evaluate our approach on four data sets with the popular system Aleph and the state-of-the-art relational learner nFOIL. On all four data sets we make learning faster for nFOIL, achieving an order-of-magnitude speed up on one of the data sets, and more accurate for Aleph.