Piotr Juszczak
Delft University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Piotr Juszczak.
Neurocomputing | 2009
Piotr Juszczak; David M. J. Tax; Elzbieta Pekalska; Robert P. W. Duin
In the problem of one-class classification one of the classes, called the target class, has to be distinguished from all other possible objects. These are considered as non-targets. The need for solving such a task arises in many practical applications, e.g. in machine fault detection, face recognition, authorship verification, fraud recognition or person identification based on biometric data. This paper proposes a new one-class classifier, the minimum spanning tree class descriptor (MST_CD). This classifier builds on the structure of the minimum spanning tree constructed on the target training set only. The classification of test objects relies on their distances to the closest edge of that tree, hence the proposed method is an example of a distance-based one-class classifier. Our experiments show that the MST_CD performs especially well in case of small sample size problems and in high-dimensional spaces.
Data Mining and Knowledge Discovery | 2009
Christopher Whitrow; David J. Hand; Piotr Juszczak; David John Weston; Niall M. Adams
The problem of preprocessing transaction data for supervised fraud classification is considered. It is impractical to present an entire series of transactions to a fraud detection system, partly because of the very high dimensionality of such data but also because of the heterogeneity of the transactions. Hence, a framework for transaction aggregation is considered and its effectiveness is evaluated against transaction-level detection, using a variety of classification methods and a realistic cost-based performance measure. These methods are applied in two case studies using real data. Transaction aggregation is found to be advantageous in many but not all circumstances. Also, the length of the aggregation period has a large impact upon performance. Aggregation seems particularly effective when a random forest is used for classification. Moreover, random forests were found to perform better than other classification methods, including SVMs, logistic regression and KNN. Aggregation also has the advantage of not requiring precisely labeled data and may be more robust to the effects of population drift.
Journal of the Operational Research Society | 2008
David J. Hand; Christopher Whitrow; Niall M. Adams; Piotr Juszczak; David John Weston
In predictive data mining, algorithms will be both optimized and compared using a measure of predictive performance. Different measures will yield different results, and it follows that it is crucial to match the measure to the true objectives. In this paper, we explore the desirable characteristics of measures for constructing and evaluating tools for mining plastic card data to detect fraud. We define two measures, one based on minimizing the overall cost to the card company, and the other based on minimizing the amount of fraud given the maximum number of investigations the card company can afford to make. We also describe a plot, analogous to the standard ROC, for displaying the performance trace of an algorithm as the relative costs of the two different kinds of misclassification—classing a fraudulent transaction as legitimate or vice versa—are varied.
Advanced Data Analysis and Classification | 2008
David John Weston; David J. Hand; Niall M. Adams; Christopher Whitrow; Piotr Juszczak
Peer group analysis is an unsupervised method for monitoring behaviour over time. In the context of plastic card fraud detection, this technique can be used to find anomalous transactions. These are transactions that deviate strongly from their peer group and are flagged as potentially fraudulent. Time alignment, the quality of the peer groups and the timeliness of assigning fraud flags to transactions are described. We demonstrate the ability to detect fraud using peer groups with real credit card transaction data and define a novel method for evaluating performance.
International Journal of Pattern Recognition and Artificial Intelligence | 2003
David M. J. Tax; Piotr Juszczak
In one-class classification one tries to describe a class of target data and to distinguish it from all other possible outlier objects. Obvious applications are areas where outliers are very diverse or very difficult or expensive to measure, such as in machine diagnostics or in medical applications. In order to have a good distinction between the target objects and the outliers, good representation of the data is essential. The performance of many one-class classifiers critically depends on the scaling of the data and is often harmed by data distributions in (nonlinear) subspaces. This paper presents a simple preprocessing method which actively tries to map the data to a spherical symmetric cluster and is almost insensitive to data distributed in subspaces. It uses techniques from Kernel PCA to rescale the data in a kernel feature space to unit variance. This transformed data can now be described very well by the Support Vector Data Description, which basically fits a hypersphere around the data. The paper presents the methods and some preliminary experimental results.
Computational Statistics & Data Analysis | 2008
Piotr Juszczak; Niall M. Adams; David J. Hand; Christopher Whitrow; David John Weston
Detecting fraudulent plastic card transactions is an important and challenging problem. The challenges arise from a number of factors including the sheer volume of transactions financial institutions have to process, the asynchronous and heterogeneous nature of transactions, and the adaptive behaviour of fraudsters. In this fraud detection problem the performance of a supervised two-class classification approach is compared with performance of an unsupervised one-class classification approach. Attention is focussed primarily on one-class classification approaches. Useful representations of transaction records, and ways of combining different one-class classifiers are described. Assessment of performance for such problems is complicated by the need for timely decision making. Performance assessment measures are discussed, and the performance of a number of one- and two-class classification methods is assessed using two large, real world personal banking data sets.
Lecture Notes in Computer Science | 2002
David M. J. Tax; Piotr Juszczak
In one-class classification one tries to describe a class of target data and to distinguish it from all other possible outlier objects. Obvious applications are areas where outliers are very diverse or very difficult or expensive to measure, such as in machine diagnostics or in medical applications. In order to have a good distinction between the target objects and the outliers, good representation of the data is essential. The performance of many one-class classifiers critically depends on the scaling of the data and is often harmed by data distributions in (non-linear) subspaces. This paper presents a simple preprocessing method which actively tries to map the data to a spherical symmetric cluster and is almost insensitive to data distributed in subspaces. It uses techniques from Kernel PCA to rescale the data in a kernel feature space to unit variance. This transformed data can now be described very well by the Support Vector Data Description, which basically fits a hypersphere around the data. The paper presents the methods and some preliminary experimental results.
multiple classifier systems | 2004
Piotr Juszczak; Robert P. W. Duin
In the paper a new method for handling with missing features values in classification is presented. The presented idea is to form an ensemble of one-class classifiers trained on each feature, preselected group of features or to compute from features a dissimilarity representation. Thus when any feature values are missing for a data point to be labeled, the ensemble can still make a reasonable decision based on the remaining classifiers. With the comparison to standard algorithms that handle with the missing features problem it is possible to build an ensemble that can classify test objects with all possible occurrence of missing features without retrain a classifier for each combination of missing features. Additionally, to train such an ensemble a training set does not need to be uncorrupted. The performance of the proposed ensemble is compared with standard methods use with missing features values problem on several UCI datasets.
Lecture Notes in Computer Science | 2006
David M. J. Tax; Piotr Juszczak; Elzbieta Pekalska; Robert P. W. Duin
Sometimes novel or outlier data has to be detected. The outliers may indicate some interesting rare event, or they should be disregarded because they cannot be reliably processed further. In the ideal case that the objects are represented by very good features, the genuine data forms a compact cluster and a good outlier measure is the distance to the cluster center. This paper proposes three new formulations to find a good cluster center together with an optimized lp-distance measure. Experiments show that for some real world datasets very good classification results are obtained and that, more specifically, the l1-distance is particularly suited for datasets containing discrete feature values.
international conference on pattern recognition | 2006
Piotr Juszczak; David M. J. Tax; Serguei Verzakov; Robert P. W. Duin
We propose an alternative to probability density classifiers based on normal distributions LDA and QDA. Instead of estimating covariance matrices using the standard maximum likelihood estimator we estimate class domains by the minimum volume enclosing ellipsoid (v-MVEE). The v-MVEE is a robust statistic rejecting a specified fraction v of the data. The performance of the domain and density approaches are compared in small sample size problems and in situations where sampling of a training and test sets is not i.i.d