[PDF] A Comparison Study of Credit Card Fraud Detection: Supervised versus Unsupervised

Abstract

Credit card has become popular mode of payment for both online and offline purchase, which leads to increasing daily fraud transactions. An Efficient fraud detection methodology is therefore essential to maintain the reliability of the payment system. In this study, we perform a comparison study of credit card fraud detection by using various supervised and unsupervised approaches. Specifically, 6 supervised classification models, i.e., Logistic Regression (LR), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGB), as well as 4 unsupervised anomaly detection models, i.e., One-Class SVM (OCSVM), Auto-Encoder (AE), Restricted Boltzmann Machine (RBM), and Generative Adversarial Networks (GAN), are explored in this study. We train all these models on a public credit card transaction dataset from Kaggle website, which contains 492 frauds out of 284,807 transactions. The labels of the transactions are used for supervised learning models only. The performance of each model is evaluated through 5-fold cross validation in terms of Area Under the Receiver Operating Curves (AUROC). Within supervised approaches, XGB and RF obtain the best performance with AUROC = 0.989 and AUROC = 0.988, respectively. While for unsupervised approaches, RBM achieves the best performance with AUROC = 0.961, followed by GAN with AUROC = 0.954. The experimental results show that supervised models perform slightly better than unsupervised models in this study. Anyway, unsupervised approaches are still promising for credit card fraud transaction detection due to the insufficient annotation and the data imbalance issue in real-world applications.

Full PDF

AA Comparison Study of Credit Card Fraud Detection:Supervised versus Unsupervised

Xuetong Niu, Li Wang, Xulei Yang*

Abstract

Credit card has become popular mode of payment for bothonline and ofﬂine purchase, which leads to increasing dailyfraud transactions. An Efﬁcient fraud detection methodologyis therefore essential to maintain the reliability of the pay-ment system. In this study, we perform a comparison studyof credit card fraud detection by using various supervisedand unsupervised approaches. Speciﬁcally, 6 supervised clas-siﬁcation models, i.e., Logistic Regression (LR), K-NearestNeighbors (KNN), Support Vector Machines (SVM), De-cision Tree (DT), Random Forest (RF), Extreme GradientBoosting (XGB), as well as 4 unsupervised anomaly detec-tion models, i.e., One-Class SVM (OCSVM), Auto-Encoder(AE), Restricted Boltzmann Machine (RBM), and Genera-tive Adversarial Networks (GAN), are explored in this study.We train all these models on a public credit card transactiondataset from Kaggle website, which contains 492 frauds outof 284,807 transactions. The labels of the transactions areused for supervised learning models only. The performanceof each model is evaluated through 5-fold cross validationin terms of Area Under the Receiver Operating Curves (AU-ROC). Within supervised approaches, XGB and RF obtainthe best performance with AUROC = 0.989 and AUROC= 0.988, respectively. While for unsupervised approaches,RBM achieves the best performance with AUROC = 0.961,followed by GAN with AUROC = 0.954. The experimen-tal results show that supervised models perform slightly bet-ter than unsupervised models in this study. Anyway, unsu-pervised approaches are still promising for credit card fraudtransaction detection due to the insufﬁcient annotation andthe data imbalance issue in real-world applications.

Introduction

Credit card fraud detection has recently become an activeresearch topic with the exploting growth of big data and AItechniques. Also, it plays an important role in banks as itwould help to reduce loss caused by fraudulent transactions.Although many proposed methods (Zareapoor and Shamsol-moali 2015; Randhawa et al. 2018) have achieved promisingresults, it is still very challenging to accurately and promptlydetect credit card fraudulent transactions due to dramaticdata imbalance and large variations of fraud transactions.

Both supervised and unsupervised learning have been in-vestigated in credit card fraud detection. For example, acombination of multiple learned fraud detectors (Chan etal. 1999) is proposed under a so-called “cost model” tosolve the problem of skewed distribution for training data.In contrast, an unsupervised method (Bolton, Hand, andothers 2001) is proposed to detect changes in behavior ofusual credit card transactions rather than relying on labelsof fraudulent historical transaction data. Also, some surveyshave comprehensively studied machine learning techniquesapplied to credit card fraud detection. For example, the sur-vey (Zojaji et al. 2016) reviews the techniques, datasets andevaluation criteria in credit card fraud detection. However,no one has evaluated machine learning models and com-pared credit card fraud detection performance in a super-vised vs unsupervised manner.In this paper, we evaluate supervised learning mod-els and 4 unsupervised learning models on a Kaggle creditcard transaction dataset. The supervised learning models in-clude Support Vector Machines (SVM) (Cortes and Vap-nik 1995), K-Nearest Neighbors (KNN) (Altman 1992), Ex-treme Gradient Boosting (XGB) (Chen et al. 2015), Lo-gistic Regression (LR) (Neter et al. 1996), Decision Tree(DT) (Quinlan 1986) and Random Forest (RF) (Breiman2001), while the unsupervised learning methods containOne-Class SVM (OCSVM) (Sch¨olkopf et al. 2000), Auto-Encoder (AE) (Deng et al. 2010), Restricted BoltzmannMachine (RBM) (Sutskever, Hinton, and Taylor 2009), andGenerative Adversarial Networks (GAN) (Goodfellow et al.2014). The supervised learning models leverage transactionlabels to train classiﬁers that are able to distinguish betweennormal and abnormal transactions. In contrast, the unsuper-vised learning models use unlabeled data for training to cap-ture normal data distribution and then determine whetheran unknown test sample is normal or abnormal. As label-ing data is time-consuming and labor intensive, labeled datais very expensive, especially when abnormal samples aremuch smaller than normal one. In this case, the unsupervisedlearning models would be more useful than the supervisedone.The main contribution of this paper is that we compre-hensively studied both supervised and unsupervised learn-ing models for credit card fraud detection and evaluate thesemachine learning algorithms on a Kaggle credit card trans- a r X i v : . [ c s . L G ] A p r ction dataset in a supervised vs unsupervised way. Accord-ing to our best knowledge, we are the ﬁrst to conduct thissort of comparison study between supervised and unsuper-vised learning on credit card fraud detection. Related Works

Traditional Machine Learning Methods

It is very time-consuming for people to check credit cardtransactions one-by-one as transaction amount is tremen-dously large. Hence, an automated method is desired forcredit card fraud detection. In decades, many machine learn-ing methods have been used to solve this problem. Next, wewill review some of them to have a big picture of this re-search area. The traditional neural networks (compared tothe current deep neural networks) have already been usedfor credit card fraud detection in (Dorronsoro et al. 1997).Hidden Markov Model (HMM) (Srivastava et al. 2008) isutilized to model the sequence of operations in credit cardtransaction processing and detect frauds. In (Bhattacharyyaet al. 2011), Support Vector Machine (SVM) and RandomForest (RF) are investigated together with Logistic Regres-sion (LR) based on real-life data from international creditcard transactions. Also, a cost-sensitive decision tree basedmethod (Sahin, Bulkan, and Duman 2013) is proposed forcredit card fraud detection and evaluated on a real worlddataset. In another work (Mahmoudi and Duman 2015), amodiﬁed Fisher discriminant function is proposed for creditcard fraud detection to be more sensitive to important in-stances. Besides using machine learning methods, a frame-work for transaction aggregation (Whitrow et al. 2009) isproposed to solve the problem of preprocessing credit cardtransaction data for supervised fraud classiﬁcation. Also, anovel learning strategy (Dal Pozzolo et al. 2018) is proposedto solve three issues of class imbalance, concept drift andveriﬁcation latency in credit card fraud detection.

Advanced Deep Learning Methods

Recently, deep learning algorithms have achieved promis-ing results in many areas such as image processing (Wanget al. 2015). Therefore, we will review several deep learn-ing based works for credit card fraud detection as follows.Long Short-Term Memory (LSTM) is utilized in (Jurgovskyet al. 2018) to formulate the credit card fraud detection asa sequence classiﬁcation problem belonging to supervisedlearning. Also, an unsupervised model (Pumsirirat and Yan2018) of deep Auto-Encoder (AE) and Restricted BoltzmannMachine (RBM) is proposed to reconstruct credit card nor-mal transactions and detect anomalies. Speciﬁcally, a frame-work tuning parameters of deep learning topologies is pro-posed for credit card fraud detection in (Roy et al. 2018). It isnecessary to mention that Generative Adversarial Network(GAN) is a remarkable model in unsupervised and semi-supervised learning. Not only it is employed to detect ac-tivity fraud and malicious users in online social networks(Zheng et al. 2018), but also it has been used in credit cardfraud detection (Fiore et al. 2017) to augment minority classexamples for the classiﬁcation between fraudulent and non-fraudulent samples. In this paper, the GAN model will also be studied and evaluated as one of unsupervised learningmethods.

Supervised Learning Methods

Some machine learning methods treat fraud transaction as asupervised classiﬁcation problem. In this way, we can traina classiﬁer based on training data together with annotations,then classify test transaction data into normal and abnor-mal categories. In this Section, we brieﬂy discuss 6 widely-used supervised machine learning approaches for credit cardfraud detection.

Logistic Regression

Logistic regression was developed by statistician David Coxin 1958 and is a regression model where the response vari-able Y is categorical. Logistic regression allows us to esti-mate the probability of a categorical response based on oneor more predictor variables x . It allows one to say that thepresence of a predictor increases (or decreases) the proba-bility of a given outcome by a speciﬁc percentage. Mathe-matically, logistic regression estimates a multiple linear re-gression function deﬁned as: Y i = β + β x i, + β x i, + ... + β p x i,p (1)where x i,j refers to the j th predictor variable for the i th ob-servation, Y i is the output of i th observation. K-Nearest Neighbors

In the classiﬁcation setting, the KNN algorithm essentiallyboils down to forming a majority vote between the K mostsimilar instances to a given unseen observation. Similarityis deﬁned according to a distance metric between two datapoints x and x (cid:48) . A popular choice is the Euclidean distancegiven by d ( x, x (cid:48) ) = (cid:112) ( x − x (cid:48) ) + ( x − x (cid:48) ) + ... + ( x n − x n (cid:48) ) (2)But other measures can be more suitable for a given set-ting and include the Manhattan, Chebyshev and Hammingdistance. More formally, given a positive integer K, an un-seen observation x and a similarity metric d , KNN classiﬁerperforms the following two steps: It runs through the wholedataset computing d between x and each training observa-tion. Suppose the K points in the training data that are clos-est to x are denoted as set A . It then estimates the conditionalprobability for each class, that is, the fraction of points in A with that given class label. P ( y = j | X = x ) = 1 K (cid:88) i ∈ A I ( y i = j ) (3)where I ( x ) is the indicator function which evaluates to when the argument x is true and otherwise Finally, theinput x is assigned to the class with the largest probability. Support Vector Machine

SVM was ﬁrst introduced by Vapnik in 1995 to solve theclassiﬁcation and regression problems. The basic idea ofSVM is to derive an optimal hyperplane that maximizes theargin between two classes. A nice property of SVMs isthat it can ﬁnd a non-linear decision boundary by project-ing the data through a nonlinear function φ to a space with ahigher dimension. This means that data points which cannotbe separated by a straight line in their original input spaceare lifted to a feature space F where there can be a linearhyperplane separating the data points of one class from another. When that hyperplane would be projected back to theinput space I, it would have the form of a non-linear curve.Mathematically, given n training data samples { ( x i , y i ) } ni =1 , x i ∈ R N , y i ∈ {− , } SVM is formulated by the following optimization problem:

M inimize Φ( w ) = 12 w T w + C n (cid:88) i =1 ξ i (4)subject to y i ( (cid:104) w, φ ( x i ) (cid:105) + b ) ≥ − ξ i , i = 1 , . . . , nξ i ≥ , i = 1 , . . . , n (5)where the kernel function φ maps training points x i frominput space into a higher dimensional feature space. Theregularization parameter C controls the trade-off betweenachieving a low error on the training data and minimisingthe norm of the weights. . Decision Tree

Decision trees are simple but intuitive models that utilize atop-down approach in which the root node creates binarysplits until a certain criteria is met. This binary splitting ofnodes provides a predicted value based on the interior nodesleading to the terminal (ﬁnal) nodes. In a classiﬁcation con-text, a decision tree will output a predicted target class foreach terminal node produced.Decision trees tend to have high variance when they uti-lize different training and test sets of the same data, sincethey tend to overﬁt on training data. This leads to poor per-formance on unseen data. Unfortunately, this limits the us-age of decision trees in predictive modeling. However, usingensemble methods, we can create models that utilize under-lying decision trees as a foundation for producing powerfulresults.

Random Forest

The random forest algorithm, proposed by L. Breiman in2001, has been successful as a general-purpose classiﬁca-tion and regression method. The approach, which combinesseveral randomized decision trees and aggregates their pre-dictions by averaging, has shown excellent performance inthe setting where the number of variables is much larger thanthe number of observations. Moreover, it is versatile enoughto be applied to large-scale problems, is easily adapted tovarious ad-hoc learning tasks, and returns measures of vari-able importance.In the classiﬁcation context, the random forest classiﬁer m is obtained via a majority vote among K classiﬁcationtrees with input x , that is, m ( x : Θ , ..., Θ K ) = { if K (cid:80) Kj =1 m ( x ; Θ j ) > otherwise (6) where Θ is the parameter set. Extreme Gradient Boosting

Gradient boosting is a powerful machine learning techniquefor regression, classiﬁcation and ranking problems, whichproduces a prediction model in the form of an ensembleof weak prediction models like decision trees. The modelis built in a stage-wise manner. In each stage, it introducesa new weak learner to compensate the shortcomings of theexisting weak learners. XGB stands for eXtreme GradientBoosting, one of the implementations of gradient boostingconcept. The unique of XGB is that it uses a more regular-ized model formalization to control over-ﬁtting and achievesbetter performance.Gradient boosting relies on regression trees, where theoptimization step works to reduce mean square error, andfor binary classiﬁcation the standard log loss is used. For amulti-class classiﬁcation problem, the objective function isto optimize the cross entropy loss. Combining the loss func-tion with a regularization term arrives at the objective func-tion. The regularization term controls the complexity and re-duces the risk of over-ﬁtting. XGB uses gradient descent foroptimization to improve the predictive accuracy at each op-timization step by following the negative of the gradient aswe are trying to ﬁnd the sink in a n-dimensional plane.To learn the set of functions used in the model, XGB min-imizes the following regularized objective L (Θ) = (cid:88) i l ( y i , ˆ y i ) + Ω(Θ) (7)where Θ is the learned parameter set, l is a differentiableconvex loss function that measures the difference betweenthe predictions ˆ y i and the target y i , and Ω is the regulariza-tion term. Unsupervised Learning Methods

There is a recent surge of interest in developing unsuper-vised generative models for anomaly detection. Generativemodels are trained to model the distribution of the nor-mal transaction data (without annotations) distribution. Anytransaction that does not follow the distribution is consideredto be anomalous. In such a way, the fraud transaction canbe detected in an unsupervised manner. In this Section, webrieﬂy discuss 4 unsupervised machine learning approachesfor credit card fraud detection.

One-Class Support Vector Machine

One-Class SVM (OCSVM) was proposed by scholkopf toidentify novelty / anomaly in an unsupervised manner with-out labeled training data. The algorithm learns a soft bound-ary in order to embrace the normal data instances using thetraining set, and then, using the testing instance, it tunes it-self to identify the abnormalities that fall outside the learnedregion.Mathematically, OCSVM is formulated by the followingoptimization problem :

M inimize Φ( w ) = 12 w T w + 1 υn n (cid:88) i =1 ξ i − ρ (8)ubject to y i ( (cid:104) w, φ ( x i ) (cid:105) + b ) ≥ ρ − ξ i , i = 1 , . . . , nξ i ≥ , i = 1 , . . . , n (9)The parameter υ sets an upper bound on the fraction of out-liers and a lower bound on the number of training examplesused as support vectors. Restricted Boltzmann Machine

A RBM model consists of visible and hidden layers, whichare connected through symmetric weights. The inputs x cor-respond to the neurons in the visible layer. The response ofthe neurons h in the hidden layer model the probability dis-tribution of the inputs. The probability distribution is derivedby learning the symmetrical connecting weights between thevisible and the hidden layers. The neurons in the same layerare not connected. The conditional probability of a conﬁgu-ration of the hidden neurons (h), given a conﬁguration of thevisible neurons associated with inputs (x), is: p ( h | x ) = (cid:89) i p ( h i | x ) (10)The objective of the generative training in RBM is to learnthe unknown (h) iteratively using the input (x).The generative training phase iterates until the recon-structed samples most closely approximates x . It is per-formed using the maximum likelihood criterion, and imple-mented by minimizing the negative log probability of thetraining data: L gen = − (cid:88) logP ( x | ( w ij , b i , c j )) (11)where b i and c j are the bias in the input and hidden layers,respectively. w ij denotes the weights between the inputs andhidden layers. Auto-Encoder

An auto-encoder (AE) learns to map from input to outputthrough a pair of encoding and decoding phases. The en-coder maps from the input to hidden layer, the decoder mapsfrom the hidden layers to the output layer to reconstructthe inputs. The hidden layers of the auto-encoder are low-dimensional and nonlinear representation of the input data.The AE is formulated as follows, ˆ X = D ( E ( X )) (12)where X is the input data, E is an encoding map, D is a de-coding map, and ˆ X is the reconstructed input data. The ob-jective of the auto-encoder is to approximate the distributionof X as accurately as possible. In particular, an autoencodercan be viewed as a solution to the following optimizationproblems: min D,E (cid:107) X − D ( E ( X )) (cid:107) (13)where (cid:107) · (cid:107) is usually 2-norm. Complex distributions of X can be modelled using a deep auto-encoder with multiplelayers, which refers to multiple pairs of encoders and de-coders. Generative Adversarial Networks

GAN is a generative model designed by Goodfellow in2014. In a GAN setup, two differentiable functions (genera-tor G and discriminator D ), represented by neural networks,are competing and trained simultaneously, which eventuallydrive the generated samples to be indistinguishable from realdata.The GAN model in this study is based on AnoGAN(Schlegl et al. 2017) recently developed for anomaly detec-tion by T. Schlegl etc. We modiﬁed the original AnoGAN bysimultaneously learn an encoder E that maps input samplesx to a latent representation z, along with a generator G anddiscriminator D during training. This enables us to avoid thecomputationally expensive SGD step for recovering a latentrepresentation at test time.After we train the model on the normal data to yield G , D and E for inference, we also deﬁne a score function A ( x ) that measures how anomalous an example x is, based ona convex combination of a reconstruction loss L G and adiscriminator-based loss L D : A ( x ) = α ∗ L G ( x ) + (1 − α ) ∗ L D ( x ) (14)where L G ( x ) = || x − G ( E ( x )) || and L D ( x ) = σ ( D ( x, E ( x )) , where α is a weighting parameter ranged in (0 , , σ is thecross-entropy loss from the discriminator of x being a realexample (class ). The deﬁnition of L G ( x ) indicates howwell the trained encoder and generator can reconstruct aninput example x . The deﬁnition of L D ( x ) captures the dis-criminator conﬁdence that a sample is derived from the realdata distribution. Experimental Results

Data Set and Preprocessing

This public dataset contains credit card transactions madein September 2013 by European cardholders. The transac-tions occurred in two days include 492 fraud records out of284,807 transactions. It is obvious that the dataset is highlyunbalanced (Fig.1). The fraudulent class only accounts for . of all transactions.The dataset contains numerical input variables which arefrom a PCA transformation due to conﬁdentiality issue. Forthe non-numerical features of “Time” and “Amount”, wenormalize them by using RobustScaler which scales thedata according to the quantile range. Speciﬁcally for the su-pervised learning models, to tackle the heavily unbalancedproblem, random downsampling is used to avoid the biasresults toward the non-fraudulent class. Through randomdownsampling, non-fraud transactions (Class = 0) are ran-domly reduced to the same amount as fraud transactions(Class = 1), which is equivalent to 492 cases of frauds and492 cases of non-fraud transactions.igure 1: Number of Different Classes Evaluation Metrics

As mentioned above, the studied data set is highly imbal-anced with 492 fraud records out of 284,807 transactions.Even all the samples are classiﬁed into non-fraud category,the classiﬁcation accuracy is still extremely high, that meanstraditional evaluation metrics like accuracy is not suitablefor this study. Instead, we report the Area Under the Re-ceiver Operating Curves (AUROC) () in our experimentalstudy. AUROC combines the false positive rate (FPR) andthe true positive rate (TPR) into one single metric. With theassumption that fraud class is “positive” and non-fraud classis “negative, the deﬁnition of FPR and TPR are as follows:

T P R = T P/P and

F P R = F P/N where P and N are the number of samples from positive andnegative classes, respectively. TP (True Positive) representsthe number of samples predicted to be positive while theyare actually positive, and FP (False Positive) the number ofsamples predicted to be positive while they are actually neg-ative.To avoid overﬁtting issues, in this study, k -fold cross-validation technique is used to estimate fraud detection per-formance. In one round of k-fold cross-validation, the dataset is ﬁrst randomly divided into k subsets (or folds), whichare of approximately equal size and are mutually exclusive.A machine learning model is then trained and tested k times,where in each time, one of the subsets is set aside as the test-ing data and the remaining k subsets are used as trainingdata. The ﬁnal testing results are predicted from k trainedsub-models. In our experimental studies, 5 cross validations(i.e., k = 5 ) are used as the validation method. Parameter Settings

The key parameters of most studied models are determinedby grid-search through cross validation, which are listed be-low: • LR: ’C’: 0.1, ’penalty’: ’l1’ • KNN: ’algorithm’: ’auto’, ’n neighbors’: 4 • SVM: ’C’: 0.5, ’kernel’: ’linear’ • DT: ’criterion’: ’entropy’, ’max depth’: 3,’min samples leaf’: 6 • RF: ’n estimators’: 30, ’oob score’: ’True’ • XGB: ’learning rate’: 0.4, ’max depth’: 4 • OCSVM: ’nu’: 0.1, ’gamma’: 0.001 • RBM: ’learning rate’: 0.0005 ’num hidden’: 10While the neural network architectures for Auto-encoderand Generative Adversarial Networks are shown below: • AE: The encoder has two dense layers with 16 and 32Relu units, each. The decoder has two dense layers of 32and 16 Relu units, respectively. • GAN: The encoder has two dense layers with 32 leakyReLu and 32 linear units, each. The generator has threedense layers of 32 ReLu, 64 ReLu and 28 linear units,respectively. And the discriminator has one dense layerof 32 leaky ReLu units followed by one linear layer withsingle unit.

Results

The AUROC values of the 6 supervised models on the stud-ied credit card transaction dataset are shown in Fig.2. It canbe seen that all the models perform well on this data set, withXGB achieves the best performance with AUROC=0.99,while DT obtains the lowest AUROC value of 0.95. It isexpected that the ensemble methods like XGB and RF per-form better than the basic methods like DT. Fig.3 showsthe AUROC values obtained by unsupervised models, withthe RBM, GAN and AE obtain AUROC values above 0.95,while the OC-SVM performs not very well with AUROC= 0.90. Overall, it can be observed that supervised modelsperform slightly better than unsupervised models, at the ex-pense of additional preprocessing procedures like outliersremove.

Discussions

In credit card fraud detection, supervised learning aims totrain a binary classiﬁcation model to distinguish betweenfraudulent and non-fraudulent instances by feeding labeleddata, while unsupervised learning is intended to model datadistribution of one class and determine whether a test sam-ple belongs to this class or not. In this section, we will dis-cuss the pros and cons of both supervised and unsupervisedlearning.Assuming there are sufﬁcient labeled data, supervisedlearning models, especially for deep neural networks, areable to achieve very promising classiﬁcation performance.For example, AlexNet (Krizhevsky, Sutskever, and Hinton2012) signiﬁcantly reduce error rates for image classiﬁca-tion on a large-scale image dataset with more than 1 mil-lion labeled images. However, in credit card fraud detec-tion, the training data in two classes are dramatically im-balanced. The fraudulent transactions are much less than thenon-fraudulent ones. As a result, the trained classiﬁer willbe biased by the majority class whereas it should pay moreattention to the minority one. Another issue for supervisedigure 2: Plot of AUROC by supervised approachesFigure 3: Plot of AUROC by unsupervised approachesearning is that transaction data could only be labeled afterseveral days even a month. This kind of veriﬁcation latency(Krivko 2010) would yield the delay for updating the su-pervised model. To summarize, the advantage of supervisedlearning is being capable to achieve very promising resultsgiven sufﬁcient training data, while the disadvantage is be-ing dramatically affected by the data imbalance issue and thedata labeling processing.Although unsupervised learning is not so attractive as thesupervised one, it is suitable for credit card fraud detectionas it does not require balanced label data. For example, theAnoGAN model (Schlegl et al. 2017) is able to learn the nor-mal data distribution and indicate whether an unknown testdata is normal or abnormal by using their proposed anomalyscoring scheme. This sort of unsupervised learning modelwould be more prominent if label data is insufﬁcient anddata imbalance is severe. Another advantage for unsuper-vised learning is that a fraudulent credit card use could bedetected promptly because the unsupervised model can beupdated in low latency by using online unlabeled data inbanks and ﬁnancial institutes. For example, one of unsu-pervised learning models, Self-Organizing Map (SOM) (Za-slavsky and Strizhak 2006), is used to build a frameworkfor unsupervised credit card fraud detection. The proposedautomated system is able to continuously modify the modelby using new added transactions because the SOM modeldoes not require priori information, e.g., whether a trans-action is done by the cardholder or not. In sum, the ad-vantage of unsupervised learning methods are quite obviousfor credit card fraud detection, while the disadvantage maybe the difﬁculty of making some unsupervised model (e.g.,GAN) converge.

Conclusions

In this paper, we conduct a comparison study for credit cardfraud detection in a supervised vs unsupervised manner byevaluating machine learning models on a Kaggle datasetwith credit card transactions data. The label availability andthe data imbalance restrict the supervised learning perfor-mance dramatically, while the unsupervised one does nothave these bottlenecks. Moreover, some unsupervised learn-ing methods, e.g., GAN, have recently received more atten-tions from the community and also achieved very promisingresults. In futures, we will focus on using GAN models toimprove the performance of credit card fraud detection. eferences [Altman 1992] Altman, N. S. 1992. An introduction tokernel and nearest-neighbor nonparametric regression. TheAmerican Statistician

Decision SupportSystems

Credit Scoring and Credit Control VII

Ma-chine learning

IEEE Intelligent Systems and TheirApplications

R package version 0.4-2

Machine learning

IEEE transactions on neural networks and learn-ing systems

EleventhAnnual Conference of the International Speech Communi-cation Association .[Dorronsoro et al. 1997] Dorronsoro, J. R.; Ginel, F.;S´anchez, C. R.; and Santa Cruz, C. 1997. Neural frauddetection in credit card operations.

IEEE transactions onneural networks .[Fiore et al. 2017] Fiore, U.; De Santis, A.; Perla, F.; Zanetti,P.; and Palmieri, F. 2017. Using generative adversarialnetworks for improving classiﬁcation effectiveness in creditcard fraud detection.

Information Sciences .[Goodfellow et al. 2014] Goodfellow, I.; Pouget-Abadie, J.;Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville,A.; and Bengio, Y. 2014. Generative adversarial nets. In

Advances in neural information processing systems , 2672–2680.[Jurgovsky et al. 2018] Jurgovsky, J.; Granitzer, M.; Ziegler,K.; Calabretto, S.; Portier, P.; He-Guelton, L.; and Caelen,O. 2018. Sequence classiﬁcation for credit-card fraud de-tection.

Expert Syst. Appl.

Expert Systems with Applica-tions

Advancesin neural information processing systems , 1097–1105. [Mahmoudi and Duman 2015] Mahmoudi, N., and Duman,E. 2015. Detecting credit card fraud by modiﬁed ﬁsherdiscriminant analysis.

Expert Systems with Applications

Applied linear statisticalmodels , volume 4. Irwin Chicago.[Pumsirirat and Yan 2018] Pumsirirat, A., and Yan, L. 2018.Credit card fraud detection using deep learning based onauto-encoder and restricted boltzmann machine.

Interna-tional Journal of Advanced Computer Science and Applica-tions

Machine learning

IEEE ACCESS

Systems and Informa-tion Engineering Design Symposium (SIEDS), 2018 , 129–134. IEEE.[Sahin, Bulkan, and Duman 2013] Sahin, Y.; Bulkan, S.; andDuman, E. 2013. A cost-sensitive decision tree ap-proach for fraud detection.

Expert Systems with Applications

International Confer-ence on Information Processing in Medical Imaging , 146–157. Springer.[Sch¨olkopf et al. 2000] Sch¨olkopf, B.; Williamson, R. C.;Smola, A. J.; Shawe-Taylor, J.; and Platt, J. C. 2000. Sup-port vector method for novelty detection. In

Advances inneural information processing systems , 582–588.[Srivastava et al. 2008] Srivastava, A.; Kundu, A.; Sural, S.;and Majumdar, A. 2008. Credit card fraud detection usinghidden markov model.

IEEE Transactions on dependableand secure computing

Advances in neural informa-tion processing systems , 1601–1608.[Wang et al. 2015] Wang, L.; Liu, T.; Wang, G.; Chan, K. L.;and Yang, Q. 2015. Video tracking using learned hierar-chical features.

IEEE Transactions on Image Processing

DataMining and Knowledge Discovery

ProcediaComputer Science

Information and Security

CoRR abs/1803.01798.[Zojaji et al. 2016] Zojaji, Z.; Atani, R. E.; Monadjemi,A. H.; et al. 2016. A survey of credit card fraud detectiontechniques: data and technique oriented perspective. arXivpreprint arXiv:1611.06439arXivpreprint arXiv:1611.06439