Siegfried Nijssen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Siegfried Nijssen is active.

Explore More

Publication

Featured researches published by Siegfried Nijssen.

Electronic Notes in Theoretical Computer Science | 2005

The Gaston Tool for Frequent Subgraph Mining

Siegfried Nijssen; Joost N. Kok

Given a database of graphs, structure mining algorithms search for all substructures that satisfy constraints such as minimum frequency, minimum confidence, minimum interest and maximum frequency. In order to make frequent subgraph mining more efficient, we propose to search with steps of increasing complexity. We present the GrAph/Sequence/Tree extractiON (Gaston) tool that implements this idea by searching first for frequent paths, then frequent free trees and finally cyclic graphs. We give results on large molecular databases.

knowledge discovery and data mining | 2008

Constraint programming for itemset mining

Luc De Raedt; Tias Guns; Siegfried Nijssen

The relationship between constraint-based mining and constraint programming is explored by showing how the typical constraints used in pattern mining can be formulated for use in constraint programming environments. The resulting framework is surprisingly flexible and allows us to combine a wide range of mining constraints in different ways. We implement this approach in off-the-shelf constraint programming systems and evaluate it empirically. The results show that the approach is not only very expressive, but also works well on complex benchmark problems.

Artificial Intelligence | 2011

Itemset mining: A constraint programming perspective

Tias Guns; Siegfried Nijssen; Luc De Raedt

The field of data mining has become accustomed to specifying constraints on patterns of interest. A large number of systems and techniques has been developed for solving such constraint-based mining problems, especially for mining itemsets. The approach taken in the field of data mining contrasts with the constraint programming principles developed within the artificial intelligence community. While most data mining research focuses on algorithmic issues and aims at developing highly optimized and scalable implementations that are tailored towards specific tasks, constraint programming employs a more declarative approach. The emphasis lies on developing high-level modeling languages and general solvers that specify what the problem is, rather than outlining how a solution should be computed, yet are powerful enough to be used across a wide variety of applications and application domains. This paper contributes a declarative constraint programming approach to data mining. More specifically, we show that it is possible to employ off-the-shelf constraint programming techniques for modeling and solving a wide variety of constraint-based itemset mining tasks, such as frequent, closed, discriminative, and cost-based itemset mining. In particular, we develop a basic constraint programming model for specifying frequent itemsets and show that this model can easily be extended to realize the other settings. This contrasts with typical procedural data mining systems where the underlying procedures need to be modified in order to accommodate new types of constraint, or novel combinations thereof. Even though the performance of state-of-the-art data mining systems outperforms that of the constraint programming approach on some standard tasks, we also show that there exist problems where the constraint programming approach leads to significant performance improvements over state-of-the-art methods in data mining and as well as to new insights into the underlying data mining problems. Many such insights can be obtained by relating the underlying search algorithms of data mining and constraint programming systems to one another. We discuss a number of interesting new research questions and challenges raised by the declarative constraint programming approach to data mining.

Journal of Chemical Information and Modeling | 2006

Substructure mining using elaborate chemical representation.

Jeroen Kazius; Siegfried Nijssen; Joost N. Kok; Thomas Bäck; Adriaan P. IJzerman

Substructure mining algorithms are important drug discovery tools since they can find substructures that affect physicochemical and biological properties. Current methods, however, only consider a part of all chemical information that is present within a data set of compounds. Therefore, the overall aim of our study was to enable more exhaustive data mining by designing methods that detect all substructures of any size, shape, and level of chemical detail. A means of chemical representation was developed that uses atomic hierarchies, thus enabling substructure mining to consider general and/or highly specific features. As a proof-of-concept, the efficient, multipurpose graph mining system Gaston learned substructures of any size and shape from a mutagenicity data set that was represented in this manner. From these substructures, we extracted a set of only six nonredundant, discriminative substructures that represent relevant biochemical knowledge. Our results demonstrate the individual and synergistic importance of elaborate chemical representation and mining for nonlinear substructures. We conclude that the combination of elaborate chemical representation and Gaston provides an excellent method for 2D substructure mining as this recipe systematically explores all substructures in different levels of chemical detail.

IEEE Transactions on Knowledge and Data Engineering | 2013

k-Pattern Set Mining under Constraints

Tias Guns; Siegfried Nijssen; L. De Raedt

We introduce the problem of k-pattern set mining, concerned with finding a set of k related patterns under constraints. This contrasts to regular pattern mining, where one searches for many individual patterns. The k-pattern set mining problem is a very general problem that can be instantiated to a wide variety of well-known mining tasks including concept-learning, rule-learning, redescription mining, conceptual clustering and tiling. To this end, we formulate a large number of constraints for use in k-pattern set mining, both at the local level, that is, on individual patterns, and on the global level, that is, on the overall pattern set. Building general solvers for the pattern set mining problem remains a challenge. Here, we investigate to what extent constraint programming (CP) can be used as a general solution strategy. We present a mapping of pattern set constraints to constraints currently available in CP. This allows us to investigate a large number of settings within a unified framework and to gain insight in the possibilities and limitations of these solvers. This is important as it allows us to create guidelines in how to model new problems successfully and how to model existing problems more efficiently. It also opens up the way for other solver technologies.

european conference on principles of data mining and knowledge discovery | 2006

Don't be afraid of simpler patterns

Björn Bringmann; Albrecht Zimmermann; Luc De Raedt; Siegfried Nijssen

This paper investigates the trade-off between the expressiveness of the pattern language and the performance of the pattern miner in structured data mining. This trade-off is investigated in the context of correlated pattern mining, which is concerned with finding the k-best patterns according to a convex criterion, for the pattern languages of itemsets, multi-itemsets, sequences, trees and graphs. The criteria used in our investigation are the typical ones in data mining: computational cost and predictive accuracy and the domain is that of mining molecular graph databases. More specifically, we provide empirical answers to the following questions: how does the expressive power of the language affect the computational cost? and what is the trade-off between expressiveness of the pattern language and the predictive accuracy of the learned model? While answering the first question, we also introduce a novel stepwise approach to correlated pattern mining in which the results of mining a simpler pattern language are employed as a starting point for mining in a more complex one. This stepwise approach typically leads to significant speed-ups (up to a factor 1000) for mining graphs.

Nucleic Acids Research | 2012

Unveiling combinatorial regulation through the combination of ChIP information and in silico cis-regulatory module detection

Hong Sun; Tias Guns; Ana Carolina Fierro; Lieven Thorrez; Siegfried Nijssen; Kathleen Marchal

Computationally retrieving biologically relevant cis-regulatory modules (CRMs) is not straightforward. Because of the large number of candidates and the imperfection of the screening methods, many spurious CRMs are detected that are as high scoring as the biologically true ones. Using ChIP-information allows not only to reduce the regions in which the binding sites of the assayed transcription factor (TF) should be located, but also allows restricting the valid CRMs to those that contain the assayed TF (here referred to as applying CRM detection in a query-based mode). In this study, we show that exploiting ChIP-information in a query-based way makes in silico CRM detection a much more feasible endeavor. To be able to handle the large datasets, the query-based setting and other specificities proper to CRM detection on ChIP-Seq based data, we developed a novel powerful CRM detection method ‘CPModule’. By applying it on a well-studied ChIP-Seq data set involved in self-renewal of mouse embryonic stem cells, we demonstrate how our tool can recover combinatorial regulation of five known TFs that are key in the self-renewal of mouse embryonic stem cells. Additionally, we make a number of new predictions on combinatorial regulation of these five key TFs with other TFs documented in TRANSFAC.

european conference on principles of data mining and knowledge discovery | 2003

Efficient frequent query discovery in farmer

Siegfried Nijssen; Joost N. Kok

The upgrade of frequent item set mining to a setup with multiple relations – frequent query mining – poses many efficiency problems. Taking Object Identity as starting point, we present several optimization techniques for frequent query mining algorithms. The resulting algorithm has a better performance than a previous ILP algorithm and competes with more specialized graph mining algorithms in performance.

IEEE Transactions on Evolutionary Computation | 2003

An analysis of the behavior of simplified evolutionary algorithms on trap functions

Siegfried Nijssen; Thomas Bäck

Methods are developed to numerically analyze an evolutionary algorithm (EA) that applies mutation and selection on a bit-string representation to find the optimum for a bimodal unitation function called a trap function. This research bridges part of the gap between the existing convergence velocity analysis of strictly unimodal functions and global convergence results assuming the limit of infinite time. As a main result of this analysis, a new so-called (1 : /spl lambda/)-EA is proposed, which generates offspring using individual mutation rates p/sub i/. While a more traditional EA using only one mutation rate is not able to find the global optimum of the trap function within an acceptable (nonexponential) time, our numerical investigations provide evidence that the new algorithm overcomes these limitations. The analysis tools used for the analysis, based on absorbing Markov chains and the calculation of transition probabilities, are demonstrated to provide an intuitive and useful method for investigating the capabilities of EAs to bridge the gap between a local and a global optimum in bimodal search spaces.

conference on information and knowledge management | 2009

A query language for analyzing networks

Anton Dries; Siegfried Nijssen; Luc De Raedt

With more and more large networks becoming available, mining and querying such networks are increasingly important tasks which are not being supported by database models and querying languages. This paper wants to alleviate this situation by proposing a data model and a query language for facilitating the analysis of networks. Key features include support for executing external tools on the networks, flexible contexts on the network each resulting in a different graph, primitives for querying subgraphs (including paths) and transforming graphs. The data model provides for a closure property, in which the output of every query can be stored in the database and used for further querying.

Explore More