Ad Feelders | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ad Feelders is active.

Explore More

Publication

Featured researches published by Ad Feelders.

Information & Management | 2000

Methodological and practical aspects of data mining

Ad Feelders; Hennie Daniels; Marcel Holsheimer

Abstract We describe the different stages in the data mining process and discuss some pitfalls and guidelines to circumvent them. Despite the predominant attention on analysis, data selection and pre-processing are the most time-consuming activities, and have a substantial influence on ultimate success. Successful data mining projects require the involvement of expertise in data mining, company data, and the subject area concerned. Despite the attractive suggestion of ‘fully automatic’ data analysis, knowledge of the processes behind the data remains indispensable in avoiding the many pitfalls of data mining.

Sigkdd Explorations | 2002

Classification trees for problems with monotonicity constraints

Rob Potharst; Ad Feelders

For classification problems with ordinal attributes very often the class attribute should increase with each or some of the explaining attributes. These are called classification problems with monotonicity constraints. Classical decision tree algorithms such as CART or C4.5 generally do not produce monotone trees, even if the dataset is completely monotone. This paper surveys the methods that have so far been proposed for generating decision trees that satisfy monotonicity constraints. A distinction is made between methods that work only for monotone datasets and methods that work for monotone and non-monotone datasets alike.

european conference on machine learning | 2008

Exceptional model mining

Dennis Leman; Ad Feelders; Arno J. Knobbe

In most databases, it is possible to identify small partitions of the data where the observed distribution is notably different from that of the database as a whole. In classical subgroup discovery, one considers the distribution of a single nominal attribute, and exceptional subgroups show a surprising increase in the occurrence of one of its values. In this paper, we introduce Exceptional Model Mining(EMM), a framework that allows for more complicated target concepts. Rather than finding subgroups based on the distribution of a single target attribute, EMM finds subgroups where a model fitted to that subgroup is somehow exceptional. We discuss regression as well as classification models, and define quality measures that determine how exceptional a given model on a subgroup is. Our framework is general enough to be applied to many types of models, even from other paradigms such as association analysis and graphical modeling.

Archive | 2005

Advances in Intelligent Data Analysis VI

A. Fazel Famili; Joost N. Kok; José M. Peña; Arno Siebes; Ad Feelders

Probabilistic Latent Clustering of Device Usage.- Condensed Nearest Neighbor Data Domain Description.- Balancing Strategies and Class Overlapping.- Modeling Conditional Distributions of Continuous Variables in Bayesian Networks.- Kernel K-Means for Categorical Data.- Using Genetic Algorithms to Improve Accuracy of Economical Indexes Prediction.- A Distance-Based Method for Preference Information Retrieval in Paired Comparisons.- Knowledge Discovery in the Identification of Differentially Expressed Genes in Tumoricidal Macrophage.- Searching for Meaningful Feature Interactions with Backward-Chaining Rule Induction.- Exploring Hierarchical Rule Systems in Parallel Coordinates.- Bayesian Networks Learning for Gene Expression Datasets.- Pulse: Mining Customer Opinions from Free Text.- Keystroke Analysis of Different Languages: A Case Study.- Combining Bayesian Networks with Higher-Order Data Representations.- Removing Statistical Biases in Unsupervised Sequence Learning.- Learning from Ambiguously Labeled Examples.- Learning Label Preferences: Ranking Error Versus Position Error.- FCLib: A Library for Building Data Analysis and Data Discovery Tools.- A Knowledge-Based Model for Analyzing GSM Network Performance.- Sentiment Classification Using Information Extraction Technique.- Extending the SOM Algorithm to Visualize Word Relationships.- Towards Automatic and Optimal Filtering Levels for Feature Selection in Text Categorization.- Block Clustering of Contingency Table and Mixture Model.- Adaptive Classifier Combination for Visual Information Processing Using Data Context-Awareness.- Self-poised Ensemble Learning.- Discriminative Remote Homology Detection Using Maximal Unique Sequence Matches.- From Local Pattern Mining to Relevant Bi-cluster Characterization.- Machine-Learning with Cellular Automata.- MDS polar : A New Approach for Dimension Reduction to Visualize High Dimensional Data.- Miner Ants Colony: A New Approach to Solve a Mine Planning Problem.- Extending the GA-EDA Hybrid Algorithm to Study Diversification and Intensification in GAs and EDAs.- Spatial Approach to Pose Variations in Face Verification.- Analysis of Feature Rankings for Classification.- A Mixture Model-Based On-line CEM Algorithm.- Reliable Hierarchical Clustering with the Self-organizing Map.- Statistical Recognition of Noun Phrases in Unrestricted Text.- Successive Restrictions Algorithm in Bayesian Networks.- Modelling the Relationship Between Streamflow and Electrical Conductivity in Hollin Creek, Southeastern Australia.- Biological Cluster Validity Indices Based on the Gene Ontology.- An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering.- Dealing with Data Corruption in Remote Sensing.- Regularized Least-Squares for Parse Ranking.- Bayesian Network Classifiers for Time-Series Microarray Data.- Feature Discovery in Classification Problems.- A New Hybrid NM Method and Particle Swarm Algorithm for Multimodal Function Optimization.- Detecting Groups of Anomalously Similar Objects in Large Data Sets.

International Journal of Approximate Reasoning | 2006

Learning Bayesian network parameters under order constraints

Ad Feelders; Linda C. van der Gaag

We consider the problem of learning the parameters of a Bayesian network from data, while taking into account prior knowledge about the signs of influences between variables. Such prior knowledge can be readily obtained from domain experts. We show that this problem of parameter learning is a special case of isotonic regression and provide a simple algorithm for computing isotonic estimates. Our experimental results for a small Bayesian network in the medical domain show that taking prior knowledge about the signs of influences into account leads to an improved fit of the true distribution, especially when only a small sample of data is available. More importantly, however, the isotonic estimator provides parameter estimates that are consistent with the specified prior knowledge, thereby resulting in a network that is more likely to be accepted by experts in its domain of application.

european conference on machine learning | 2008

Nearest Neighbour Classification with Monotonicity Constraints

Wouter Duivesteijn; Ad Feelders

In many application areas of machine learning, prior knowledge concerning the monotonicity of relations between the response variable and predictor variables is readily available. Monotonicity may also be an important model requirement with a view toward explaining and justifying decisions, such as acceptance/rejection decisions. We propose a modified nearest neighbour algorithm for the construction of monotone classifiers from data. We start by making the training data monotone with as few label changes as possible. The relabeled data set can be viewed as a monotone classifier that has the lowest possible error-rate on the training data. The relabeled data is subsequently used as the training sample by a modified nearest neighbour algorithm. This modified nearest neighbour rule produces predictions that are guaranteed to satisfy the monotonicity constraints. Hence, it is much more likely to be accepted by the intended users. Our experiments show that monotone kNN often outperforms standard kNN in problems where the monotonicity constraints are applicable.

intelligent data analysis | 2003

Pruning for Monotone Classification Trees

Ad Feelders; Martijn Pardoel

For classification problems with ordinal attributes very often the class attribute should increase with each or some of the explanatory attributes. These are called classification problems with monotonicity constraints. Standard classification tree algorithms such as CART or C4.5 are not guaranteed to produce monotone trees, even if the data set is completely monotone. We look at pruning based methods to build monotone classification trees from monotone as well as nonmonotone data sets. We develop a number of fixing methods, that make a non-monotone tree monotone by additional pruning steps. These fixing methods can be combined with existing pruning techniques to obtain a sequence of monotone trees. The performance of the new algorithms is evaluated through experimental studies on artificial as well as real life data sets. We conclude that the monotone trees have a slightly better predictive performance and are considerably smaller than trees constructed by the standard algorithm.

international conference on data mining | 2010

Subgroup Discovery Meets Bayesian Networks -- An Exceptional Model Mining Approach

Wouter Duivesteijn; Arno J. Knobbe; Ad Feelders; Matthijs van Leeuwen

Whenever a dataset has multiple discrete target variables, we want our algorithms to consider not only the variables themselves, but also the interdependencies between them. We propose to use these interdependencies to quantify the quality of subgroups, by integrating Bayesian networks with the Exceptional Model Mining framework. Within this framework, candidate subgroups are generated. For each candidate, we fit a Bayesian network on the target variables. Then we compare the network’s structure to the structure of the Bayesian network fitted on the whole dataset. To perform this comparison, we define an edit distance-based distance metric that is appropriate for Bayesian networks. We show interesting subgroups that we experimentally found with our method on datasets from music theory, semantic scene classification, biology and zoogeography.

international conference on data mining | 2010

Monotone Relabeling in Ordinal Classification

Ad Feelders

In many applications of data mining we know beforehand that the response variable should be increasing (or decreasing) in the attributes. Such relations between response and attributes are called monotone. In this paper we present a new algorithm to compute an optimal monotone classification of a data set for convex loss functions. Moreover, we show how the algorithm can be extended to compute all optimal monotone classifications with little additional effort. Monotone relabeling is useful for at least two reasons. Firstly, models trained on relabeled data sets often have better predictive performance than models trained on the original data. Secondly, relabeling is an important building block for the construction of monotone classifiers. We apply the new algorithm to investigate the effect on the prediction error of relabeling the training sample for

international conference on knowledge capture | 2005