Peter van der Putten
Leiden University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peter van der Putten.
Machine Learning | 2004
Peter van der Putten; Maarten van Someren
The CoIL Challenge 2000 data mining competition attracted a wide variety of solutions, both in terms of approaches and performance. The goal of the competition was to predict who would be interested in buying a specific insurance product and to explain why people would buy. Unlike in most other competitions, the majority of participants provided a report describing the path to their solution. In this article we use the framework of bias-variance decomposition of error to analyze what caused the wide range of prediction performance. We characterize the challenge problem to make it comparable to other problems and evaluate why certain methods work or not. We also include an evaluation of the submitted explanations by a marketing expert. We find that variance is the key component of error for this problem. Participants use various strategies in data preparation and model development that reduce variance error, such as feature selection and the use of simple, robust and low variance learners like Naive Bayes. Adding constructed features, modeling with complex, weak bias learners and extensive fine tuning by the participants often increase the variance error.The CoIL Challenge 2000 data mining competition attracted a wide variety of solutions, both in terms of approaches and performance. The goal of the competition was to predict who would be interested in buying a specific insurance product and to explain why people would buy. Unlike in most other competitions, the majority of participants provided a report describing the path to their solution. In this article we use the framework of bias-variance decomposition of error to analyze what caused the wide range of prediction performance. We characterize the challenge problem to make it comparable to other problems and evaluate why certain methods work or not. We also include an evaluation of the submitted explanations by a marketing expert. We find that variance is the key component of error for this problem. Participants use various strategies in data preparation and model development that reduce variance error, such as feature selection and the use of simple, robust and low variance learners like Naive Bayes. Adding constructed features, modeling with complex, weak bias learners and extensive fine tuning by the participants often increase the variance error.
Social Science Research Network | 2002
Peter van der Putten; Joost N. Kok; Amar Gupta
In data mining applications, the availability of data is often a serious problem. For instance, elementary customer information resides in customer databases, but market survey data are only available for a subset of the customers or even for a different sample of customers. Data fusion provides a way out by combining information from different sources into a single data set for further data mining. While a significant amount of work has been done on data fusion in the past, most of the research has been performed outside of the data mining community. In this paper, we provide an overview of data fusion, introduce basic terminology and the statistical matching approach, distinguish between internal and external evaluation, and we conclude with a larger case study.
advanced data mining and applications | 2005
Lingjun Meng; Peter van der Putten; Haiyang Wang
Artificial Immune Systems are a new class of algorithms inspired by how the immune system recognizes, attacks and remembers intruders. This is a fascinating idea, but to be accepted for mainstream data mining applications, extensive benchmarking is needed to demonstrate the reliability and accuracy of these algorithms. In our research we focus on the AIRS classification algorithm. It has been claimed previously that AIRS consistently outperforms other algorithms. However, in these papers AIRS was compared to benchmark results from literature. To ensure consistent conditions we carried out benchmark tests on all algorithms using exactly the same set up. Our findings show that AIRS is a stable and robust classifier that produces around average results. This contrasts with earlier claims but shows AIRS is mature enough to be used for mainstream data mining.
asia-pacific software engineering conference | 2014
Mohd Hafeez Osman; Michel R. V. Chaudron; Peter van der Putten
A large fraction of the time consumed in software development and maintenance is spent on understanding the software, which indicates it is a critical activity. Software documentation, including software architecture design documentation, is an important aid in software comprehension. However, keeping documentation up to date with evolving source code is often challenging and absence of up date or more comprehensive design-level documentation is not uncommon. As a solution, software architecture design may be recovered using reverse engineering techniques. However, existing reverse engineering methods produce complete design diagrams that include all the details that exist in the source code. The absence of abstraction from implementation details limits the usefulness of existing reverse engineering techniques for understanding software. This paper aims to address this problem by providing a method and tool that interactively allows developers to interactively explore a reverse engineered class diagram at scalable levels of abstraction. To this end, we propose a Software Architecure Abstraction (SAAbs) framework and an automated tool which implements the SAAbs framework. The SAAbs framework applies a machine learning scoring algorithm to produce a class importance ranking for class diagrams, this ranking is the basis for software architecture abstraction and visualization. We validate this framework by validating the SAAbs tool using a semi-structured survey. On average, 30 respondents of this survey rated 5.40 out of 6 points, which indicate that this is a useful tool to assist software developers in understanding a system.
intelligent data analysis | 2016
Deepak Soekhoe; Peter van der Putten; Aske Plaat
In this paper we study the effect of target set size on transfer learning in deep learning convolutional neural networks. This is an important problem as labelling is a costly task, or for new or specific classes the number of labelled instances available may simply be too small. We present results for a series of experiments where we either train on a target of classes from scratch, retrain all layers, or subsequently lock more layers in the network, for the Tiny-ImageNet and MiniPlaces2 data sets. Our findings indicate that for smaller target data sets freezing the weights for the initial layers of the network gives better results on the target set classes. We present a simple and easy to implement training heuristic based on these findings.
world congress on information and communication technologies | 2014
Mohd Hafeez Osman; Michel R. V. Chaudron; Peter van der Putten; Truong Ho-Quang
In this paper, we report on a machine learning approach to condensing class diagrams. The goal of the algorithm is to learn to identify what classes are most relevant to include in the diagram, as opposed to full reverse engineering of all classes. This paper focuses on building a classifier that is based on the names of classes in addition to design metrics, and we compare to earlier work that is based on design metrics only. We assess our condensation method by comparing our condensed class diagrams to class diagrams that were made during the original forward design. Our results show that combining text metrics with design metrics leads to modest improvements over using design metrics only. On average, the improvement reaches 5.3%. 7 out of 10 evaluated case studies show improvement ranges from 1% to 22%.
intelligent data analysis | 2016
Livia Teernstra; Peter van der Putten; Liesbeth Noordegraaf-Eelens; Fons J. Verbeek
This paper introduces The Morality Machine, a system that tracks ethical sentiment in Twitter discussions. Empirical approaches to ethics are rare, and to our knowledge this system is the first to take a machine learning approach. It is based on Moral Foundations Theory, a framework of moral values that are assumed to be universal. Carefully handcrafted keyword dictionaries for Moral Foundations Theory exist, but experiments demonstrate that models that do not leverage these have similar or superior performance, thus proving the value of a more pure machine learning approach.
intelligent data analysis | 2001
Michiel van Wezel; Walter A. Kosters; Peter van der Putten; Joost N. Kok
In this paper we present a neural network for nonmetric multidimensional scaling. In our approach, the monotone transformation that is a part of every nonmetric scaling algorithm is performed by a special feedforward neural network with a modified backpropagation algorithm. Contrary to traditional methods, we thus explicitly model the monotone transformation by a special purpose neural network. The architecture of the new network and the derivation of the learning rule are given, as well as some experimental results. The experimental results are positive.
intelligent data analysis | 2016
Martijn J. Post; Peter van der Putten; Jan N. van Rijn
It is often claimed that data pre-processing is an important factor contributing towards the performance of classification algorithms. In this paper we investigate feature selection, a common data pre-processing technique. We conduct a large scale experiment and present results on what algorithms and data sets benefit from this technique. Using meta-learning we can find out for which combinations this is the case. To complement a large set of meta-features, we introduce the Feature Selection Landmarkers, which prove useful for this task. All our experimental results are made publicly available on OpenML.
Archive | 2014
Maarten H. Lamers; Peter van der Putten; Fons J. Verbeek
In recent years in arts, technology and science there appears an increasing push to use technology and design in a more personal and autonomous context, integrated with the physical world. Creative platforms are developed that open up personal digital/physical technology to larger groups of novice tinkerers, allowing people to take control of technology and prototype solutions to personal problems and aims. Likewise, education benefits by providing students with tools and platforms to learn by doing and making. However, these advances lead to new challenges for scientific research and education. In this chapter, we explore some of the opportunities and challenges and summarize these into key observations. Particular attention is given to tinkering in research-based education, and the opportunities for digital tinkering in emerging worlds.