Tias Guns
Katholieke Universiteit Leuven
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tias Guns.
knowledge discovery and data mining | 2008
Luc De Raedt; Tias Guns; Siegfried Nijssen
The relationship between constraint-based mining and constraint programming is explored by showing how the typical constraints used in pattern mining can be formulated for use in constraint programming environments. The resulting framework is surprisingly flexible and allows us to combine a wide range of mining constraints in different ways. We implement this approach in off-the-shelf constraint programming systems and evaluate it empirically. The results show that the approach is not only very expressive, but also works well on complex benchmark problems.
Artificial Intelligence | 2011
Tias Guns; Siegfried Nijssen; Luc De Raedt
The field of data mining has become accustomed to specifying constraints on patterns of interest. A large number of systems and techniques has been developed for solving such constraint-based mining problems, especially for mining itemsets. The approach taken in the field of data mining contrasts with the constraint programming principles developed within the artificial intelligence community. While most data mining research focuses on algorithmic issues and aims at developing highly optimized and scalable implementations that are tailored towards specific tasks, constraint programming employs a more declarative approach. The emphasis lies on developing high-level modeling languages and general solvers that specify what the problem is, rather than outlining how a solution should be computed, yet are powerful enough to be used across a wide variety of applications and application domains. This paper contributes a declarative constraint programming approach to data mining. More specifically, we show that it is possible to employ off-the-shelf constraint programming techniques for modeling and solving a wide variety of constraint-based itemset mining tasks, such as frequent, closed, discriminative, and cost-based itemset mining. In particular, we develop a basic constraint programming model for specifying frequent itemsets and show that this model can easily be extended to realize the other settings. This contrasts with typical procedural data mining systems where the underlying procedures need to be modified in order to accommodate new types of constraint, or novel combinations thereof. Even though the performance of state-of-the-art data mining systems outperforms that of the constraint programming approach on some standard tasks, we also show that there exist problems where the constraint programming approach leads to significant performance improvements over state-of-the-art methods in data mining and as well as to new insights into the underlying data mining problems. Many such insights can be obtained by relating the underlying search algorithms of data mining and constraint programming systems to one another. We discuss a number of interesting new research questions and challenges raised by the declarative constraint programming approach to data mining.
IEEE Transactions on Knowledge and Data Engineering | 2013
Tias Guns; Siegfried Nijssen; L. De Raedt
We introduce the problem of k-pattern set mining, concerned with finding a set of k related patterns under constraints. This contrasts to regular pattern mining, where one searches for many individual patterns. The k-pattern set mining problem is a very general problem that can be instantiated to a wide variety of well-known mining tasks including concept-learning, rule-learning, redescription mining, conceptual clustering and tiling. To this end, we formulate a large number of constraints for use in k-pattern set mining, both at the local level, that is, on individual patterns, and on the global level, that is, on the overall pattern set. Building general solvers for the pattern set mining problem remains a challenge. Here, we investigate to what extent constraint programming (CP) can be used as a general solution strategy. We present a mapping of pattern set constraints to constraints currently available in CP. This allows us to investigate a large number of settings within a unified framework and to gain insight in the possibilities and limitations of these solvers. This is important as it allows us to create guidelines in how to model new problems successfully and how to model existing problems more efficiently. It also opens up the way for other solver technologies.
Nucleic Acids Research | 2012
Hong Sun; Tias Guns; Ana Carolina Fierro; Lieven Thorrez; Siegfried Nijssen; Kathleen Marchal
Computationally retrieving biologically relevant cis-regulatory modules (CRMs) is not straightforward. Because of the large number of candidates and the imperfection of the screening methods, many spurious CRMs are detected that are as high scoring as the biologically true ones. Using ChIP-information allows not only to reduce the regions in which the binding sites of the assayed transcription factor (TF) should be located, but also allows restricting the valid CRMs to those that contain the assayed TF (here referred to as applying CRM detection in a query-based mode). In this study, we show that exploiting ChIP-information in a query-based way makes in silico CRM detection a much more feasible endeavor. To be able to handle the large datasets, the query-based setting and other specificities proper to CRM detection on ChIP-Seq based data, we developed a novel powerful CRM detection method ‘CPModule’. By applying it on a well-studied ChIP-Seq data set involved in self-renewal of mouse embryonic stem cells, we demonstrate how our tool can recover combinatorial regulation of five known TFs that are key in the self-renewal of mouse embryonic stem cells. Additionally, we make a number of new predictions on combinatorial regulation of these five key TFs with other TFs documented in TRANSFAC.
integration of ai and or techniques in constraint programming | 2015
Benjamin Negrevergne; Tias Guns
The goal of constraint-based sequence mining is to find sequences of symbols that are included in a large number of input sequences and that satisfy some constraints specified by the user. Many constraints have been proposed in the literature, but a general framework is still missing. We investigate the use of constraint programming as general framework for this task.
international conference on data mining | 2013
Benjamin Negrevergne; Anton Dries; Tias Guns; Siegfried Nijssen
Finding small sets of interesting patterns is an important challenge in pattern mining. In this paper, we argue that several well-known approaches that address this challenge are based on performing pair wise comparisons between patterns. Examples include finding closed patterns, free patterns, relevant subgroups and skyline patterns. Although progress has been made on each of these individual problems, a generic approach for solving these problems (and more) is still lacking. This paper tackles this challenge. It proposes a novel, generic approach for handling pattern mining problems that involve pair wise comparisons between patterns. Our key contributions are the following. First, we propose a novel algebra for programming pattern mining problems. This algebra extends relational algebras in a novel way towards pattern mining. It allows for the generic combination of constraints on individual patterns with dominance relations between patterns. Second, we introduce a modified generic constraint satisfaction system to evaluate these algebraic expressions. Experiments show that this generic approach can indeed effectively identify patterns expressed in the algebra.
integration of ai and or techniques in constraint programming | 2014
Behrouz Babaki; Tias Guns; Siegfried Nijssen
In recent years, it has been realized that many problems in data mining can be seen as pure optimisation problems. In this work, we investigate the problem of constraint-based clustering from an optimisation point of view. The use of constraints in clustering is a recent development and allows to encode prior beliefs about desirable clusters. This paper proposes a new solution for minimum-sum-of-squares clustering under constraints, where the constraints considered are must-link constraints, cannot-link constraints and anti-monotone constraints on individual clusters. Contrary to most earlier approaches, it is exact and provides a fundamental approach for including these constraints. The proposed approach uses column generation in an integer linear programming setting. The key insight is that these constraints can be pushed into a branch-and-bound algorithm used for generating new columns. Experimental results show the feasibility of the approach and the promise of the branch-and-bound algorithm that solves the subproblem directly.
european conference on machine learning | 2010
Siegfried Nijssen; Tias Guns
Over the years many pattern mining tasks and algorithms have been proposed. Traditionally, the focus of these studies was on the efficiency of the computation and the scalability towards very large databases. Little research has however been done on a general framework that encompasses several of these problems. In earlier work we showed how constraint programming (CP) can offer such a general framework; unfortunately, however, we also found that out-of-the-box CP solvers lack the efficiency and scalability achieved by specialized itemset mining systems, which could discourage their use. Here we study the question whether a framework can be built that inherits the generality of CP systems and the efficiency of specialized algorithms. We propose a CP-based framework for pattern mining that avoids the redundant representations and propagations found in existing CP systems. We show experimentally that an implementation of this framework performs comparable to specialized itemset mining systems; furthermore, under certain conditions it lists itemsets with polynomial delay, which demonstrates that it also is a promising approach for analyzing pattern mining tasks from more theoretical perspectives. This is illustrated on a graph mining problem.
principles and practice of constraint programming | 2015
Andrea Rendl; Tias Guns; Peter J. Stuckey; Guido Tack
Much of the power of CP comes from the ability to create complex hybrid search algorithms specific to an application. Unfortunately there is no widely accepted standard for specifying search, and each solver typically requires detailed knowledge in order to build complex searches. This makes the barrier to entry for exploring different search methods quite high. Furthermore, search is a core part of the solver and usually highly optimised. Any imposition on the solver writer to change this part of their system is significant. In this paper we investigate how powerful we can make a uniform language for meta-search without placing any burden on the solver writer. The key to this is to only interact with the solver when a solution is found. We present MINISEARCH, a meta-search language that can directly use any FLATZINC solver. Optionally, it can interact with solvers through an efficient C++ API. We illustrate the expressiveness of the language and performance using different solvers on a number of examples.
international conference on data mining | 2011
Siegfried Nijssen; Aída Jiménez; Tias Guns
We propose a new framework for constraint-based pattern mining in multi-relational databases. Distinguishing features of the framework are that (1) it allows finding patterns not only under anti-monotonic constraints, but also under monotonic constraints and closed ness constraints, among others, expressed over complex aggregates over multiple relations, (2) it builds on a declarative graphical representation of constraints that links closely to data models of multi-relational databases and constraint networks in constraint programming, (3) it maps multi-relational pattern mining tasks into constraint programs. Our framework builds on a unifying perspective of multi-relational pattern mining, relational database technology and constraint networks in constraint programming. We demonstrate our framework on IMDB and Finance multi-relational databases.