Klaus B. Schebesch
University of Western Ontario
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Klaus B. Schebesch.
GfKl | 2008
Klaus B. Schebesch; Ralf Stecking
Owing to the huge size of the credit markets, even small improvements in classification accuracy might considerably reduce effective misclassification costs experienced by banks. Support vector machines (SVM) are useful classification methods for credit client scoring. However, the urgent need to further boost classification performance as well as the stability of results in applications leads the machine learning community into developing SVM with multiple kernels and many other combined approaches. Using a data set from a German bank, we first examine the effects of combining a large number of base SVM on classification performance and robustness. The base models are trained on different sets of reduced client characteristics and may also use different kernels. Furthermore, using censored outputs of multiple SVM models leads to more reliable predictions in most cases. But there also remains a credit client subset that seems to be unpredictable. We show that in unbalanced data sets, most common in credit scoring, some minor adjustments may overcome this weakness. We then compare our results to the results obtained earlier with more traditional, single SVM credit scoring models.
GfKl | 2007
Klaus B. Schebesch; Ralf Stecking
We explore simultaneous variable subset selection and kernel selection within SVM classification models. First we apply results from SVM classification models with different kernel functions to a fixed subset of credit client variables provided by a German bank. Free variable subset selection for the bank data is discussed next. A simple stochastic search procedure for variable subset selection is also presented.
A Quarterly Journal of Operations Research | 2014
Klaus B. Schebesch; Ralf Stecking
Predictive classification is a part of data mining and of many related data-intensive research activities. In applications deriving from business intelligence, potentially valuable data from large databases often cannot be used in an unrestricted way. Privacy constraints may not allow the data modeler to use all of the existing feature variables in building the classification models. In certain situations, pre-processing the original data can lead to intermediate datasets, which hide private or commercially sensitive information but still contain information useful enough for building competitive classification models. To this end, we propose to cooperatively use both unsupervised Clustering and supervised Support Vector Machines. For an instance of real-life credit client scoring, we then evaluate our approach against the case of unrestricted use of all data features.
GfKl | 2012
Ralf Stecking; Klaus B. Schebesch
Credit client scoring on medium sized data sets can be accomplished by means of Support Vector Machines (SVM), a powerful and robust machine learning method. However, real life credit client data sets are usually huge, containing up to hundred thousands of records, with good credit clients vastly outnumbering the defaulting ones. Such data pose severe computational barriers for SVM and other kernel methods, especially if all pairwise data point similarities are requested. Hence, methods which avoid extensive training on the complete data are in high demand. A possible solution is clustering as preprocessing and classification on the more informative resulting data like cluster centers. Clustering variants which avoid the computation of all pairwise similarities robustly filter useful information from the large imbalanced credit client data set, especially when used in conjunction with a symbolic cluster representation. Subsequently, we construct credit client clusters representing both client classes, which are then used for training a non standard SVM adaptable to our imbalanced class set sizes. We also show that SVM trained on symbolic cluster centers result in classification models, which outperform traditional statistical models as well as SVM trained on all our original data.
A Quarterly Journal of Operations Research | 2017
Klaus B. Schebesch; Ralf Stecking
Computational Topological Data Analysis (TDA) is a collection of procedures which permits extracting certain robust features of high dimensional data, even when the number of data points is relatively small. Classical statistical data analysis is not very successful at or even cannot handle such situations altogether. Hidden features or structure in high dimensional data expresses some direct and indirect links between data points. Such may be the case when there are no explicit links between persons like clients in a database but there may still be important implicit links which characterize client populations and which also make different such populations more comparable. We explore the potential usefulness of applying TDA to different versions of credit scoring data, where clients are credit takers with a known defaulting behavior.
intelligent data analysis | 2015
Ralf Stecking; Klaus B. Schebesch
Modern data collections create vast opportunities for detecting useful hidden relationships. Also, increasingly, they fuel data privacy concerns. A trade-off between privacy protection and data usefulness is by now widely acknowledged. Real world data classification tasks, as for example credit scoring applications have to deal with such data security limitations by finding a way to effectively incorporate privacy preserving procedures. To this end we propose as a first stage to use a microaggregation procedure in order to anonymize data over personal credit client feature information. In a second stage we examine the performance of support vector machines (SVM) on such anonymized data. SVM are powerful and robust machine learning methods, having superior credit scoring classification performance when applied to original, non-anonymized data. We first partition the original credit scoring data set and construct anonymized data representatives, which are then used for credit client behavior forecasting models constructed by SVM and other comparable learning methods. The validation procedure for such models is adapted to the two-stage modeling approach. In order to assess the loss owing to data anonymization, the different classification models are evaluated against models that are trained on the original data.
A Quarterly Journal of Operations Research | 2008
Ralf Stecking; Klaus B. Schebesch
Many empirical data describing features of some persons or objects with associated class labels (e.g. credit client features and the recorded defaulting behaviors in our application [5], [6]) are clearly not linearly separable. However, owing to an interplay of relatively sparse data (relating to high dimensional input feature spaces) and a validation procedure like leave-one-out, a nonlinear classification cannot, in many cases, improve this situation but in a minor way. Attributing all the remaining errors to noise seems rather implausible, as data recording is offline and not prone to errors of the type occurring e.g. when measuring process data with (online) sensors. Experiments with classification models on input subsets even suggest that our credit client data contain some hidden redundancy. This was not eliminated by statistical data preprocessing and leads to rather competitive validated models on input subsets and even to slightly superior results for combinations of such input subset base models [3]. These base models all reflect different views of the same data. However, class regions with highly nonlinear boundaries can also occur if important features (i.e. other explaining factors) are for some reason not available (unknown, neglected, etc.). In order to see this, simply project linearly separable data onto a feature subset with smaller dimension.
Studia Universitatis „Vasile Goldiş” Arad, Seria Ştiinţe Economice | 2018
Dan Stelian Deac; Klaus B. Schebesch
Abstract Using efficient marketing strategies for understanding and improving the relation between vendors and clients rests upon analyzing and forecasting a wealth of data which appear at different time resolutions and at levels of aggregation. More often than not, market success does not have consistent explanations in terms of a few independent influence factors. Indeed, it may be difficult to explain why certain products or services tend to sell well while others do not. The rather limited success of finding general explanations from which to draw specific conclusions good enough in order to generate forecasting models results in our proposal to use data driven models with no strong prior hypothesis concerning the nature of dependencies between potentially relevant variables. If the relations between the data are not purely random, then a general or flexible enough data driven model will eventually identify them. However, this may come at a high cost concerning computational resources and with the risk of overtraining. It may also preclude any useful on-line or real time applications of such models. In order to remedy this, we propose a modeling cycle which provides information about the adequacy of a model complexity class and which also highlights some nonstandard measures of expected model performance.
Studia Universitatis „Vasile Goldis” Arad – Economics Series | 2016
Eugen Remes; Klaus B. Schebesch; Cosmina Remes; Dan Stelian Deac
Abstract The paper aims at highlighting a current phenomenon on the labour market in Romania, related to the existence of important categories of unemployed persons and looking for a job, but not registered in the statistics of state institutions dealing with unemployment. The analysis is conducted in the area of western Romania (for the Arad, Bihor and Timis counties) and aims categories of unemployed people with different skill levels, which for different reasons, are not accounted for statistically.
International Journal of Intelligent Enterprise | 2014
Klaus B. Schebesch; Eduardo Tomé
This paper aims at studying the importance of intellectual capital as a catalyser of the development of border regions. The topic is relevant because traditionally border regions are less cared by central governments due to distance and therefore they have to rely on themselves to develop; IC and border connections can be, theoretically two ways of helping to solve the development problem of those regions. In a century marked by increased globalisation, the topic becomes ever more present and pertinent. Specifically we study two pairs of cross-border regions: Portugal and Spain, Hungary and Romania. We use a method to define stable cross-border coalitions of cities in both sets of countries. We use data about social and economic indicators, intellectual capital and travel distances. We find evidence of two stable cross border coalitions in Iberia and three in Eastern Europe.