Thomas Verbraken
Katholieke Universiteit Leuven
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas Verbraken.
IEEE Transactions on Software Engineering | 2013
Karel Dejaeger; Thomas Verbraken; Bart Baesens
Software testing is a crucial activity during software development and fault prediction models assist practitioners herein by providing an upfront identification of faulty software code by drawing upon the machine learning literature. While especially the Naive Bayes classifier is often applied in this regard, citing predictive performance and comprehensibility as its major strengths, a number of alternative Bayesian algorithms that boost the possibility of constructing simpler networks with fewer nodes and arcs remain unexplored. This study contributes to the literature by considering 15 different Bayesian Network (BN) classifiers and comparing them to other popular machine learning techniques. Furthermore, the applicability of the Markov blanket principle for feature selection, which is a natural extension to BN theory, is investigated. The results, both in terms of the AUC and the recently introduced H-measure, are rigorously tested using the statistical framework of Demšar. It is concluded that simple and comprehensible networks with less nodes can be constructed using BN classifiers other than the Naive Bayes classifier. Furthermore, it is found that the aspects of comprehensibility and predictive performance need to be balanced out, and also the development context is an item which should be taken into account during model selection.
IEEE Transactions on Knowledge and Data Engineering | 2013
Thomas Verbraken; Wouter Verbeke; Bart Baesens
The interest for data mining techniques has increased tremendously during the past decades, and numerous classification techniques have been applied in a wide range of business applications. Hence, the need for adequate performance measures has become more important than ever. In this paper, a cost-benefit analysis framework is formalized in order to define performance measures which are aligned with the main objectives of the end users, i.e., profit maximization. A new performance measure is defined, the expected maximum profit criterion. This general framework is then applied to the customer churn problem with its particular cost-benefit structure. The advantage of this approach is that it assists companies with selecting the classifier which maximizes the profit. Moreover, it aids with the practical implementation in the sense that it provides guidance about the fraction of the customer base to be included in the retention campaign.
European Journal of Operational Research | 2014
Thomas Verbraken; Cristián Bravo; Richard Weber; Bart Baesens
This paper presents a new approach for consumer credit scoring, by tailoring a profit-based classification performance measure to credit risk modeling. This performance measure takes into account the expected profits and losses of credit granting and thereby better aligns the model developers’ objectives with those of the lending company. It is based on the Expected Maximum Profit (EMP) measure and is used to find a trade-off between the expected losses – driven by the exposure of the loan and the loss given default – and the operational income given by the loan. Additionally, one of the major advantages of using the proposed measure is that it permits to calculate the optimal cutoff value, which is necessary for model implementation. To test the proposed approach, we use a dataset of loans granted by a government institution, and benchmarked the accuracy and monetary gain of using EMP, accuracy, and the area under the ROC curve as measures for selecting model parameters, and for determining the respective cutoff values. The results show that our proposed profit-based classification measure outperforms the alternative approaches in terms of both accuracy and monetary value in the test set, and that it facilitates model deployment.
European Journal of Operational Research | 2012
Alex Seret; Thomas Verbraken; Sébastien Versailles; Bart Baesens
The field of direct marketing is constantly searching for new data mining techniques in order to analyze the increasing available amount of data. Self-organizing maps (SOM) have been widely applied and discussed in the literature, since they give the possibility to reduce the complexity of a high dimensional attribute space while providing a powerful visual exploration facility. Combined with clustering techniques and the extraction of the so-called salient dimensions, it is possible for a direct marketer to gain a high level insight about a dataset of prospects. In this paper, a SOM-based profile generator is presented, consisting of a generic method leading to value-adding and business-oriented profiles for targeting individuals with predefined characteristics. Moreover, the proposed method is applied in detail to a concrete case study from the concert industry. The performance of the method is then illustrated and discussed and possible future research tracks are outlined.
Applied Soft Computing | 2015
Sebastián Maldonado; Álvaro Flores; Thomas Verbraken; Bart Baesens; Richard Weber
Graphical abstractDisplay Omitted HighlightsA novel profit-based feature selection method for churn prediction with SVM is presented.A backward elimination algorithm is performed to maximize the profit of a retention campaign.Our experiments on churn prediction datasets underline the potential of the proposed approaches. Churn prediction is an important application of classification models that identify those customers most likely to attrite based on their respective characteristics described by e.g. socio-demographic and behavioral variables. Since nowadays more and more of such features are captured and stored in the respective computational systems, an appropriate handling of the resulting information overload becomes a highly relevant issue when it comes to build customer retention systems based on churn prediction models. As a consequence, feature selection is an important step of the classifier construction process. Most feature selection techniques; however, are based on statistically inspired validation criteria, which not necessarily lead to models that optimize goals specified by the respective organization. In this paper we propose a profit-driven approach for classifier construction and simultaneous variable selection based on support vector machines. Experimental results show that our models outperform conventional techniques for feature selection achieving superior performance with respect to business-related goals.
intelligent data analysis | 2014
Thomas Verbraken; Wouter Verbeke; Bart Baesens
Customer churn prediction is becoming an increasingly important business analytics problem for telecom operators. In order to increase the efficiency of customer retention campaigns, churn prediction models need to be accurate as well as compact and interpretable. Although a myriad of techniques for churn prediction has been examined, there has been little attention for the use of Bayesian Network classifiers. This paper investigates the predictive power of a number of Bayesian Network algorithms, ranging from the Naive Bayes classifier to General Bayesian Network classifiers. Furthermore, a feature selection method based on the concept of the Markov Blanket, which is genuinely related to Bayesian Networks, is tested. The performance of the classifiers is evaluated with both the Area under the Receiver Operating Characteristic Curve and the recently introduced Maximum Profit criterion. The Maximum Profit criterion performs an intelligent optimization by targeting this fraction of the customer base which would maximize the profit generated by a retention campaign. The results of the experiments are rigorously tested and indicate that most of the analyzed techniques have a comparable performance. Some methods, however, are more preferred since they lead to compact networks, which enhances the interpretability and comprehensibility of the churn prediction models.
Applied Soft Computing | 2014
Alex Seret; Thomas Verbraken; Bart Baesens
Graphical abstractDisplay Omitted HighlightsThis paper proposes a straightforward way to apply constrained clustering.Soft attribute-level constraints are generated based on feature order preferences.Practitioners can formalize and use their a priori knowledge.A methodology implementing this approach is applied in a direct marketing context. Clustering has always been an exploratory but critical step in the knowledge discovery process. Often unsupervised, the clustering task received a huge interest when reinforced by different kinds of inputs provided by the user. This paper presents an approach giving the possibility to incorporate business knowledge in order to guide the clustering algorithm. A formalization of the fact that an intuitive a priori prioritization of the variables might exist, is presented in this paper and applied in a direct marketing context using recent data. By providing the analyst with a new approach offering different clustering perspectives, this paper proposes a straightforward way to apply constrained clustering with soft attribute-level constraints based on feature order preferences.
workshop on e-business | 2011
Thomas Verbraken; Frank Goethals; Wouter Verbeke; Bart Baesens
This paper indicates that knowledge about a person’s social network is valuable to predict the intent to purchase books and computers online. Data was gathered about a network of 681 persons and their intent to buy products online. Results of a range of networked classification techniques are compared with the predictive power of logistic regression. This comparison indicates that information about a person’s social network is more valuable to predict a person’s intent to buy online than the person’s characteristics such as age, gender, his intensity of computer use and his enjoyment when working with the computer.
decision support systems | 2014
Thomas Verbraken; Frank Goethals; Wouter Verbeke; Bart Baesens
ieee international conference on cloud computing technology and science | 2012
Thomas Verbraken; Stefan Lessmann; Bart Baesens