Johan Huysmans | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Johan Huysmans is active.

Explore More

Publication

Featured researches published by Johan Huysmans.

Rule Extraction from Support Vector Machines | 2008

Rule Extraction from Support Vector Machines: An Overview of Issues and Application in Credit Scoring

David Martens; Johan Huysmans; Rudy Setiono; Jan Vanthienen; Bart Baesens

Innovative storage technology and the rising popularity of the Internet have generated an ever-growing amount of data. In this vast amount of data much valuable knowledge is available, yet it is hidden. The Support Vector Machine (SVM) is a state-of-the-art classification technique that generally provides accurate models, as it is able to capture non-linearities in the data. However, this strength is also its main weakness, as the generated non-linear models are typically regarded as incomprehensible black-box models. By extracting rules that mimic the black box as closely as possible, we can provide some insight into the logics of the SVM model. This explanation capability is of crucial importance in any domain where the model needs to be validated before being implemented, such as in credit scoring (loan default prediction) and medical diagnosis. If the SVM is regarded as the current state-of-the-art, SVM rule extraction can be the state-of-the-art of the (near) future. This chapter provides an overview of recently proposed SVM rule extraction techniques, complemented with the pedagogical Artificial Neural Network (ANN) rule extraction techniques which are also suitable for SVMs. Issues related to this topic are the different rule outputs and corresponding rule expressiveness; the focus on high dimensional data as SVM models typically perform well on such data; and the requirement that the extracted rules are in line with existing domain knowledge. These issues are explained and further illustrated with a credit scoring case, where we extract a Trepan tree and a RIPPER rule set from the generated SVM model. The benefit of decision tables in a rule extraction context is also demonstrated. Finally, some interesting alternatives for SVM rule extraction are listed.

international conference on data mining | 2006

Evaluation of web robot discovery techniques: a benchmarking study

Nick Geens; Johan Huysmans; Jan Vanthienen

This paper describes part of a web usage mining study executed on log files obtained from a Belgian e-commerce company. From these log files, it can be observed that numerous web robots are active on the site. Most of these robots show a crawling behavior that is radically different from the browsing behavior of human visitors. Because the owners of the e-shop desire information about the paths that human visitors follow through the site, it is of crucial importance to remove these robotic visits from the log files. Several existing methods for web robot discovery are evaluated and compared, none of them leading to satisfying results. Therefore, a new technique is developed that results in a successful and reliable identification of web robots.

data warehousing and knowledge discovery | 2006

ITER: an algorithm for predictive regression rule extraction

Johan Huysmans; Bart Baesens; Jan Vanthienen

Various benchmarking studies have shown that artificial neural networks and support vector machines have a superior performance when compared to more traditional machine learning techniques. The main resistance against these newer techniques is based on their lack of interpretability: it is difficult for the human analyst to understand the motivation behind these models’ decisions. Various rule extraction techniques have been proposed to overcome this opacity restriction. However, most of these extraction techniques are devised for classification and only few algorithms can deal with regression problems. In this paper, we present ITER, a new algorithm for pedagogical regression rule extraction. Based on a trained ‘black box’ model, ITER is able to extract human-understandable regression rules. Experiments show that the extracted model performs well in comparison with CART regression trees and various other techniques.

data and knowledge engineering | 2007

A new approach for measuring rule set consistency

Johan Huysmans; Bart Baesens; Jan Vanthienen

Various algorithms are capable of learning a set of classification rules from a number of observations with their corresponding class labels. Whereas the obtained rule set is usually evaluated by measuring its accuracy on a number of unseen examples, there are several other evaluation criteria, such as comprehensibility and consistency, that are often overlooked. In this paper we focus on the aspect of consistency: if a rule learner is applied several times on the same data set, will it provide rule sets that are similar over the different runs? A new measure is proposed and various examples show how this new measure can be used to decide between different algorithms and rule sets or to find out whether the rules in a knowledge base need to be updated.

international conference on enterprise information systems | 2006

Comprehensible credit-scoring knowledge visualization using decision tables and diagrams

Christophe Mues; Johan Huysmans; Jan Vanthienen; Bart Baesens

One of the key decision activities in financial institutions is to assess the credit-worthiness of an applicant for a loan, and thereupon decide whether or not to grant the loan. Many classification methods have been suggested in the credit-scoring literature to distinguish good payers from bad payers. Especially neural networks have received a lot of attention. However, a major drawback is their lack of transparency. While they can achieve a high predictive accuracy rate, the reasoning behind how they reach their decisions is not readily available, which hinders their acceptance by practitioners. Therefore, we have, in earlier work, proposed a two-step process to open the neural network black box which involves: (1) extracting rules from the network; (2) visualizing this rule set using an intuitive graphical representation. In this paper, we will focus on the second step and further investigate the use of two types of representations: decision tables and diagrams. The former are a well-known representation originally used as a programming technique. The latter are a generalization of decision trees taking on the form of a rooted, acyclic digraph instead of a tree, and have mainly been studied and applied by the hardware design community. We will compare both representations in terms of their ability to compactly represent the decision knowledge extracted from two real-life credit-scoring data sets.

WIT Transactions on Information and Communication Technologies | 2004

The influence of caching on web usage mining

Johan Huysmans; Bart Baesens; Jan Vanthienen

Most web servers collect lots of data during their daily operation. Information, such as which pages are requested and who is responsible for these requests, is stored in log files. The analysis of these log files may yield worthwhile information on how to adapt the site to improve the user experience. However, the data in the log files is usually not stored in a format suited to perform analyses. Many operations are needed to transform the logs in a format that is convenient for the chosen type of analysis. After an overview of these operations, we will discuss how caching of pages can skew the results of studies. We will show how caching can be detected and how one can deal with it. Afterwards, the techniques are applied to the data of a European online wine shop.

intelligence and security informatics | 2008

A Data Miner’s Approach to Country Corruption Analysis

Johan Huysmans; Bart Baesens; Jan Vanthienen

Corruption is usually defined as the misuse of public office for private gain. Whereas the practice of corruption is probably as old as government itself, the recent emergence of more detailed measures has resulted in a considerable growth of empirical research on corruption. Furthermore, possible links between government corruption and terrorism have attracted an additional interest in this research field. Most of the existing literature discusses the topic from a socio-economical perspective and only few studies tackle research on corruption from a data mining point of view. In this chapter, we apply various data mining techniques onto a cross-country database linking macro-economical variables to perceived levels of corruption. In the first part, self organizing maps are applied to study the interconnections between these variables. Afterwards, various predictive models are trained on part of the data and used to forecast corruption for other countries. Large deviations for specific countries between these models’ predictions and the actual values can prove useful for further research. Finally, projection of the forecasts onto a self organizing map allows a detailed comparison between the different models’ behavior.

WIT Transactions on Information and Communication Technologies | 2005

The use of knowledge discovery techniques for behavioural ccoring

N Meeus; Johan Huysmans; Bart Baesens; Jan Vanthienen; Martina Vandebroek

This paper discusses the use of knowledge discovery techniques for a recent development in the field of scoring: behavioural scoring. The goal of behavioural scoring is to develop a model that predicts the creditworthiness of existing customers on the basis of their behaviour in the past. This paper explains briefly the Knowledge Discovery in Data process and applies the technique of logistic regression to real life datasets of a Belgian financial institution. It describes the development of scoring models for a cheque account, a credit account and the customer level and compares the model results for different pre-processing values and selection methods by means of the ROC curve, p-values and misclassification rates.

decision support systems | 2011