Sicco Verwer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sicco Verwer is active.

Explore More

Publication

Featured researches published by Sicco Verwer.

Data Mining and Knowledge Discovery | 2010

Three naive Bayes approaches for discrimination-free classification

Tgk Toon Calders; Sicco Verwer

In this paper, we investigate how to modify the naive Bayes classifier in order to perform classification that is restricted to be independent with respect to a given sensitive attribute. Such independency restrictions occur naturally when the decision process leading to the labels in the data-set was biased; e.g., due to gender or racial discrimination. This setting is motivated by many cases in which there exist laws that disallow a decision that is partly based on discrimination. Naive application of machine learning techniques would result in huge fines for companies. We present three approaches for making the naive Bayes classifier discrimination-free: (i) modifying the probability of the decision being positive, (ii) training one model for every sensitive attribute value and balancing them, and (iii) adding a latent variable to the Bayesian model that represents the unbiased label and optimizing the model parameters for likelihood using expectation maximization. We present experiments for the three approaches on both artificial and real-life data.

international colloquium on grammatical inference | 2010

A likelihood-ratio test for identifying probabilistic deterministic real-time automata from positive data

Sicco Verwer; Mathijs de Weerdt; Cees Witteveen

We adapt an algorithm (RTI) for identifying (learning) a deterministic real-time automaton (DRTA) to the setting of positive timed strings (or time-stamped event sequences). An DRTA can be seen as a deterministic finite state automaton (DFA) with time constraints. Because DRTAs model time using numbers, they can be exponentially more compact than equivalent DFA models that model time using states. We use a new likelihood-ratio statistical test for checking consistency in the RTI algorithm. The result is the RTI+ algorithm, which stands for real-time identification from positive data. RTI+ is an efficient algorithm for identifying DRTAs from positive data. We show using artificial data that RTI+ is capable of identifying sufficiently large DRTAs in order to identify real-world real-time systems.

Theory and Practice of Logic Programming | 2015

Predicate logic as a modeling language: Modeling and solving some machine learning and data mining problems with IDP3

Maurice Bruynooghe; Hendrik Blockeel; Bart Bogaerts; Broes De Cat; Stef De Pooter; Joachim Jansen; Anthony Labarre; Jan Ramon; Marc Denecker; Sicco Verwer

This paper provides a gentle introduction to problem solving with the IDP3 system. The core of IDP3 is a finite model generator that supports first order logic enriched with types, inductive definitions, aggregates and partial functions. It offers its users a modeling language that is a slight extension of predicate logic and allows them to solve a wide range of search problems. Apart from a small introductory example, applications are selected from problems that arose within machine learning and data mining research. These research areas have recently shown a strong interest in declarative modeling and constraint solving as opposed to algorithmic approaches. The paper illustrates that the IDP3 system can be a valuable tool for researchers with such an interest. The first problem is in the domain of stemmatology, a domain of philology concerned with the relationship between surviving variant versions of text. The second problem is about a somewhat related problem within biology where phylogenetic trees are used to represent the evolution of species. The third and final problem concerns the classical problem of learning a minimal automaton consistent with a given set of strings. For this last problem, we show that the performance of our solution comes very close to that of a state-of-the art solution. For each of these applications, we analyze the problem, illustrate the development of a logic-based model and explore how alternatives can affect the performance.

Empirical Software Engineering | 2013

Software model synthesis using satisfiability solvers

Marijn J. H. Heule; Sicco Verwer

We introduce a novel approach for synthesis of software models based on identifying deterministic finite state automata. Our approach consists of three important contributions. First, we argue that in order to model software, one should focus mainly on observed executions (positive data), and use the randomly generated failures (negative data) only for testing consistency. We present a new greedy heuristic for this purpose, and show how to integrate it in the state-of-the-art evidence-driven state-merging (EDSM) algorithm. Second, we apply the enhanced EDSM algorithm to iteratively reduce the size of the problem. Yet during each iteration, the evidence is divided over states and hence the effectiveness of this algorithm is decreased. We propose—when EDSM becomes too weak—to tackle the reduced identification problem using satisfiability solvers. Third, in case the amount of positive data is small, we solve the identification problem several times by randomizing the greedy heuristic and combine the solutions using a voting scheme. The interaction between these contributions appeared crucial to solve hard software models synthesis benchmarks. Our implementation, called DFASAT, won the StaMinA competition.

Machine Learning | 2014

PAutomaC: a probabilistic automata and hidden Markov models learning competition

Sicco Verwer; Rémi Eyraud; Colin de la Higuera

Approximating distributions over strings is a hard learning problem. Typical techniques involve using finite state machines as models and attempting to learn these; these machines can either be hand built and then have their weights estimated, or built by grammatical inference techniques: the structure and the weights are then learned simultaneously. The Probabilistic Automata learning Competition (PAutomaC), run in 2012, was the first grammatical inference challenge that allowed the comparison between these methods and algorithms. Its main goal was to provide an overview of the state-of-the-art techniques for this hard learning problem. Both artificial data and real data were presented and contestants were to try to estimate the probabilities of strings. The purpose of this paper is to describe some of the technical and intrinsic challenges such a competition has to face, to give a broad state of the art concerning both the problems dealing with learning grammars and finite state machines and the relevant literature. This paper also provides the results of the competition and a brief description and analysis of the different approaches the main participants used.

Studies in Applied Philosophy, Epistemology and Rational Ethics ; 3 | 2013

Combining and Analyzing Judicial Databases

Susan W. van den Braak; Sunil Choenni; Sicco Verwer

To monitor crime and law enforcement, databases of several organizations, covering different parts of the criminal justice system, have to be integrated. Combined data from different organizations may then be analyzed, for instance, to investigate how specific groups of suspects move through the system. Such insight is useful for several reasons, for example, to define an effective and coherent safety policy. To integrate or relate judicial data two approaches are currently employed: a data warehouse and a dataspace approach. The former is useful for applications that require combined data on an individual level. The latter is suitable for data with a higher level of aggregation. However, developing applications that exploit combined judicial data is not without risk. One important issue while handling such data is the protection of the privacy of individuals. Therefore, several precautions have to be taken in the data integration process: use aggregate data, follow the Dutch Personal Data Protection Act, and filter out privacy-sensitive results. Another issue is that judicial data is essentially different from data in exact or technical sciences. Therefore, data mining should be used with caution, in particular to avoid incorrect conclusions and to prevent discrimination and stigmatization of certain groups of individuals.

Machine Learning | 2014

Improving active Mealy machine learning for protocol conformance testing

Fides Aarts; Harco Kuppens; Jan Tretmans; Frits W. Vaandrager; Sicco Verwer

Using a well-known industrial case study from the verification literature, the bounded retransmission protocol, we show how active learning can be used to establish the correctness of protocol implementation I relative to a given reference implementation R. Using active learning, we learn a model MR of reference implementation R, which serves as input for a model-based testing tool that checks conformance of implementation I to MR. In addition, we also explore an alternative approach in which we learn a model MI of implementation I, which is compared to model MR using an equivalence checker. Our work uses a unique combination of software tools for model construction (Uppaal), active learning (LearnLib, Tomte), model-based testing (JTorX, TorXakis) and verification (CADP, MRMC). We show how these tools can be used for learning models of and revealing errors in implementations, present the new notion of a conformance oracle, and demonstrate how conformance oracles can be used to speed up conformance checking.

international joint conference on artificial intelligence | 2011

Learning driving behavior by timed syntactic pattern recognition

Sicco Verwer; Mathijs de Weerdt; Cees Witteveen

We advocate the use of an explicit time representation in syntactic pattern recognition because it can result in more succinct models and easier learning problems. We apply this approach to the real-world problem of learning models for the driving behavior of truck drivers. We discretize the values of onboard sensors into simple events. Instead of the common syntactic pattern recognition approach of sampling the signal values at a fixed rate, we model the time constraints using timed models. We learn these models using the RTI+ algorithm from grammatical inference, and show how to use computational mechanics and a form of semi-supervised classification to construct a real-time automaton classifier for driving behavior. Promising results are shown using this new approach.

Information & Computation | 2011

The efficiency of identifying timed automata and the power of clocks

Sicco Verwer; Mathijs de Weerdt; Cees Witteveen

We develop theory on the efficiency of identifying (learning) timed automata. In particular, we show that: (i) deterministic timed automata cannot be identified efficiently in the limit from labeled data and (ii) that one-clock deterministic timed automata can be identified efficiently in the limit from labeled data. We prove these results based on the distinguishability of these classes of timed automata. More specifically, we prove that the languages of deterministic timed automata cannot, and that one-clock deterministic timed automata can be distinguished from each other using strings in length bounded by a polynomial. In addition, we provide an algorithm that identifies one-clock deterministic timed automata efficiently from labeled data.Our results have interesting consequences for the power of clocks that are interesting also out of the scope of the identification problem.

Machine Learning | 2012

Efficiently identifying deterministic real-time automata from labeled data

Sicco Verwer; Mathijs de Weerdt; Cees Witteveen

We develop a novel learning algorithm RTI for identifying a deterministic real-time automaton (DRTA) from labeled time-stamped event sequences. The RTI algorithm is based on the current state of the art in deterministic finite-state automaton (DFA) identification, called evidence-driven state-merging (EDSM). In addition to having a DFA structure, a DRTA contains time constraints between occurrences of consecutive events. Although this seems a small difference, we show that the problem of identifying a DRTA is much more difficult than the problem of identifying a DFA: identifying only the time constraints of a DRTA given its DFA structure is already NP-complete. In spite of this additional complexity, we show that RTI is a correct and complete algorithm that converges efficiently (from polynomial time and data) to the correct DRTA in the limit. To the best of our knowledge, this is the first algorithm that can identify a timed automaton model from time-stamped event sequences.A straightforward alternative to identifying DRTAs is to identify a DFA that models time implicitly, i.e., a DFA that uses different states for different points in time. Such a DFA can be identified by first sampling the timed sequences using a fixed frequency, and subsequently applying EDSM to the resulting non-timed event sequences. We evaluate the performance of both RTI and this sampling approach experimentally on artificially generated data. In these experiments RTI outperforms the sampling approach significantly. Thus, we show that if we obtain data from a real-time system, it is easier to identify a DRTA from this data than to identify an equivalent DFA.

Explore More