A Comparison Study of Credit Card Fraud Detection: Supervised versus Unsupervised
AA Comparison Study of Credit Card Fraud Detection:Supervised versus Unsupervised
Xuetong Niu, Li Wang, Xulei Yang*
Abstract
Credit card has become popular mode of payment for bothonline and offline purchase, which leads to increasing dailyfraud transactions. An Efficient fraud detection methodologyis therefore essential to maintain the reliability of the pay-ment system. In this study, we perform a comparison studyof credit card fraud detection by using various supervisedand unsupervised approaches. Specifically, 6 supervised clas-sification models, i.e., Logistic Regression (LR), K-NearestNeighbors (KNN), Support Vector Machines (SVM), De-cision Tree (DT), Random Forest (RF), Extreme GradientBoosting (XGB), as well as 4 unsupervised anomaly detec-tion models, i.e., One-Class SVM (OCSVM), Auto-Encoder(AE), Restricted Boltzmann Machine (RBM), and Genera-tive Adversarial Networks (GAN), are explored in this study.We train all these models on a public credit card transactiondataset from Kaggle website, which contains 492 frauds outof 284,807 transactions. The labels of the transactions areused for supervised learning models only. The performanceof each model is evaluated through 5-fold cross validationin terms of Area Under the Receiver Operating Curves (AU-ROC). Within supervised approaches, XGB and RF obtainthe best performance with AUROC = 0.989 and AUROC= 0.988, respectively. While for unsupervised approaches,RBM achieves the best performance with AUROC = 0.961,followed by GAN with AUROC = 0.954. The experimen-tal results show that supervised models perform slightly bet-ter than unsupervised models in this study. Anyway, unsu-pervised approaches are still promising for credit card fraudtransaction detection due to the insufficient annotation andthe data imbalance issue in real-world applications.
Introduction
Credit card fraud detection has recently become an activeresearch topic with the exploting growth of big data and AItechniques. Also, it plays an important role in banks as itwould help to reduce loss caused by fraudulent transactions.Although many proposed methods (Zareapoor and Shamsol-moali 2015; Randhawa et al. 2018) have achieved promisingresults, it is still very challenging to accurately and promptlydetect credit card fraudulent transactions due to dramaticdata imbalance and large variations of fraud transactions.
Copyright c (cid:13)
Both supervised and unsupervised learning have been in-vestigated in credit card fraud detection. For example, acombination of multiple learned fraud detectors (Chan etal. 1999) is proposed under a so-called “cost model” tosolve the problem of skewed distribution for training data.In contrast, an unsupervised method (Bolton, Hand, andothers 2001) is proposed to detect changes in behavior ofusual credit card transactions rather than relying on labelsof fraudulent historical transaction data. Also, some surveyshave comprehensively studied machine learning techniquesapplied to credit card fraud detection. For example, the sur-vey (Zojaji et al. 2016) reviews the techniques, datasets andevaluation criteria in credit card fraud detection. However,no one has evaluated machine learning models and com-pared credit card fraud detection performance in a super-vised vs unsupervised manner.In this paper, we evaluate supervised learning mod-els and 4 unsupervised learning models on a Kaggle creditcard transaction dataset. The supervised learning models in-clude Support Vector Machines (SVM) (Cortes and Vap-nik 1995), K-Nearest Neighbors (KNN) (Altman 1992), Ex-treme Gradient Boosting (XGB) (Chen et al. 2015), Lo-gistic Regression (LR) (Neter et al. 1996), Decision Tree(DT) (Quinlan 1986) and Random Forest (RF) (Breiman2001), while the unsupervised learning methods containOne-Class SVM (OCSVM) (Sch¨olkopf et al. 2000), Auto-Encoder (AE) (Deng et al. 2010), Restricted BoltzmannMachine (RBM) (Sutskever, Hinton, and Taylor 2009), andGenerative Adversarial Networks (GAN) (Goodfellow et al.2014). The supervised learning models leverage transactionlabels to train classifiers that are able to distinguish betweennormal and abnormal transactions. In contrast, the unsuper-vised learning models use unlabeled data for training to cap-ture normal data distribution and then determine whetheran unknown test sample is normal or abnormal. As label-ing data is time-consuming and labor intensive, labeled datais very expensive, especially when abnormal samples aremuch smaller than normal one. In this case, the unsupervisedlearning models would be more useful than the supervisedone.The main contribution of this paper is that we compre-hensively studied both supervised and unsupervised learn-ing models for credit card fraud detection and evaluate thesemachine learning algorithms on a Kaggle credit card trans- a r X i v : . [ c s . L G ] A p r ction dataset in a supervised vs unsupervised way. Accord-ing to our best knowledge, we are the first to conduct thissort of comparison study between supervised and unsuper-vised learning on credit card fraud detection. Related Works
Traditional Machine Learning Methods
It is very time-consuming for people to check credit cardtransactions one-by-one as transaction amount is tremen-dously large. Hence, an automated method is desired forcredit card fraud detection. In decades, many machine learn-ing methods have been used to solve this problem. Next, wewill review some of them to have a big picture of this re-search area. The traditional neural networks (compared tothe current deep neural networks) have already been usedfor credit card fraud detection in (Dorronsoro et al. 1997).Hidden Markov Model (HMM) (Srivastava et al. 2008) isutilized to model the sequence of operations in credit cardtransaction processing and detect frauds. In (Bhattacharyyaet al. 2011), Support Vector Machine (SVM) and RandomForest (RF) are investigated together with Logistic Regres-sion (LR) based on real-life data from international creditcard transactions. Also, a cost-sensitive decision tree basedmethod (Sahin, Bulkan, and Duman 2013) is proposed forcredit card fraud detection and evaluated on a real worlddataset. In another work (Mahmoudi and Duman 2015), amodified Fisher discriminant function is proposed for creditcard fraud detection to be more sensitive to important in-stances. Besides using machine learning methods, a frame-work for transaction aggregation (Whitrow et al. 2009) isproposed to solve the problem of preprocessing credit cardtransaction data for supervised fraud classification. Also, anovel learning strategy (Dal Pozzolo et al. 2018) is proposedto solve three issues of class imbalance, concept drift andverification latency in credit card fraud detection.
Advanced Deep Learning Methods
Recently, deep learning algorithms have achieved promis-ing results in many areas such as image processing (Wanget al. 2015). Therefore, we will review several deep learn-ing based works for credit card fraud detection as follows.Long Short-Term Memory (LSTM) is utilized in (Jurgovskyet al. 2018) to formulate the credit card fraud detection asa sequence classification problem belonging to supervisedlearning. Also, an unsupervised model (Pumsirirat and Yan2018) of deep Auto-Encoder (AE) and Restricted BoltzmannMachine (RBM) is proposed to reconstruct credit card nor-mal transactions and detect anomalies. Specifically, a frame-work tuning parameters of deep learning topologies is pro-posed for credit card fraud detection in (Roy et al. 2018). It isnecessary to mention that Generative Adversarial Network(GAN) is a remarkable model in unsupervised and semi-supervised learning. Not only it is employed to detect ac-tivity fraud and malicious users in online social networks(Zheng et al. 2018), but also it has been used in credit cardfraud detection (Fiore et al. 2017) to augment minority classexamples for the classification between fraudulent and non-fraudulent samples. In this paper, the GAN model will also be studied and evaluated as one of unsupervised learningmethods.
Supervised Learning Methods
Some machine learning methods treat fraud transaction as asupervised classification problem. In this way, we can traina classifier based on training data together with annotations,then classify test transaction data into normal and abnor-mal categories. In this Section, we briefly discuss 6 widely-used supervised machine learning approaches for credit cardfraud detection.
Logistic Regression
Logistic regression was developed by statistician David Coxin 1958 and is a regression model where the response vari-able Y is categorical. Logistic regression allows us to esti-mate the probability of a categorical response based on oneor more predictor variables x . It allows one to say that thepresence of a predictor increases (or decreases) the proba-bility of a given outcome by a specific percentage. Mathe-matically, logistic regression estimates a multiple linear re-gression function defined as: Y i = β + β x i, + β x i, + ... + β p x i,p (1)where x i,j refers to the j th predictor variable for the i th ob-servation, Y i is the output of i th observation. K-Nearest Neighbors
In the classification setting, the KNN algorithm essentiallyboils down to forming a majority vote between the K mostsimilar instances to a given unseen observation. Similarityis defined according to a distance metric between two datapoints x and x (cid:48) . A popular choice is the Euclidean distancegiven by d ( x, x (cid:48) ) = (cid:112) ( x − x (cid:48) ) + ( x − x (cid:48) ) + ... + ( x n − x n (cid:48) ) (2)But other measures can be more suitable for a given set-ting and include the Manhattan, Chebyshev and Hammingdistance. More formally, given a positive integer K, an un-seen observation x and a similarity metric d , KNN classifierperforms the following two steps: It runs through the wholedataset computing d between x and each training observa-tion. Suppose the K points in the training data that are clos-est to x are denoted as set A . It then estimates the conditionalprobability for each class, that is, the fraction of points in A with that given class label. P ( y = j | X = x ) = 1 K (cid:88) i ∈ A I ( y i = j ) (3)where I ( x ) is the indicator function which evaluates to when the argument x is true and otherwise Finally, theinput x is assigned to the class with the largest probability. Support Vector Machine
SVM was first introduced by Vapnik in 1995 to solve theclassification and regression problems. The basic idea ofSVM is to derive an optimal hyperplane that maximizes theargin between two classes. A nice property of SVMs isthat it can find a non-linear decision boundary by project-ing the data through a nonlinear function φ to a space with ahigher dimension. This means that data points which cannotbe separated by a straight line in their original input spaceare lifted to a feature space F where there can be a linearhyperplane separating the data points of one class from another. When that hyperplane would be projected back to theinput space I, it would have the form of a non-linear curve.Mathematically, given n training data samples { ( x i , y i ) } ni =1 , x i ∈ R N , y i ∈ {− , } SVM is formulated by the following optimization problem:
M inimize Φ( w ) = 12 w T w + C n (cid:88) i =1 ξ i (4)subject to y i ( (cid:104) w, φ ( x i ) (cid:105) + b ) ≥ − ξ i , i = 1 , . . . , nξ i ≥ , i = 1 , . . . , n (5)where the kernel function φ maps training points x i frominput space into a higher dimensional feature space. Theregularization parameter C controls the trade-off betweenachieving a low error on the training data and minimisingthe norm of the weights. . Decision Tree
Decision trees are simple but intuitive models that utilize atop-down approach in which the root node creates binarysplits until a certain criteria is met. This binary splitting ofnodes provides a predicted value based on the interior nodesleading to the terminal (final) nodes. In a classification con-text, a decision tree will output a predicted target class foreach terminal node produced.Decision trees tend to have high variance when they uti-lize different training and test sets of the same data, sincethey tend to overfit on training data. This leads to poor per-formance on unseen data. Unfortunately, this limits the us-age of decision trees in predictive modeling. However, usingensemble methods, we can create models that utilize under-lying decision trees as a foundation for producing powerfulresults.
Random Forest
The random forest algorithm, proposed by L. Breiman in2001, has been successful as a general-purpose classifica-tion and regression method. The approach, which combinesseveral randomized decision trees and aggregates their pre-dictions by averaging, has shown excellent performance inthe setting where the number of variables is much larger thanthe number of observations. Moreover, it is versatile enoughto be applied to large-scale problems, is easily adapted tovarious ad-hoc learning tasks, and returns measures of vari-able importance.In the classification context, the random forest classifier m is obtained via a majority vote among K classificationtrees with input x , that is, m ( x : Θ , ..., Θ K ) = { if K (cid:80) Kj =1 m ( x ; Θ j ) > otherwise (6) where Θ is the parameter set. Extreme Gradient Boosting
Gradient boosting is a powerful machine learning techniquefor regression, classification and ranking problems, whichproduces a prediction model in the form of an ensembleof weak prediction models like decision trees. The modelis built in a stage-wise manner. In each stage, it introducesa new weak learner to compensate the shortcomings of theexisting weak learners. XGB stands for eXtreme GradientBoosting, one of the implementations of gradient boostingconcept. The unique of XGB is that it uses a more regular-ized model formalization to control over-fitting and achievesbetter performance.Gradient boosting relies on regression trees, where theoptimization step works to reduce mean square error, andfor binary classification the standard log loss is used. For amulti-class classification problem, the objective function isto optimize the cross entropy loss. Combining the loss func-tion with a regularization term arrives at the objective func-tion. The regularization term controls the complexity and re-duces the risk of over-fitting. XGB uses gradient descent foroptimization to improve the predictive accuracy at each op-timization step by following the negative of the gradient aswe are trying to find the sink in a n-dimensional plane.To learn the set of functions used in the model, XGB min-imizes the following regularized objective L (Θ) = (cid:88) i l ( y i , ˆ y i ) + Ω(Θ) (7)where Θ is the learned parameter set, l is a differentiableconvex loss function that measures the difference betweenthe predictions ˆ y i and the target y i , and Ω is the regulariza-tion term. Unsupervised Learning Methods
There is a recent surge of interest in developing unsuper-vised generative models for anomaly detection. Generativemodels are trained to model the distribution of the nor-mal transaction data (without annotations) distribution. Anytransaction that does not follow the distribution is consideredto be anomalous. In such a way, the fraud transaction canbe detected in an unsupervised manner. In this Section, webriefly discuss 4 unsupervised machine learning approachesfor credit card fraud detection.
One-Class Support Vector Machine
One-Class SVM (OCSVM) was proposed by scholkopf toidentify novelty / anomaly in an unsupervised manner with-out labeled training data. The algorithm learns a soft bound-ary in order to embrace the normal data instances using thetraining set, and then, using the testing instance, it tunes it-self to identify the abnormalities that fall outside the learnedregion.Mathematically, OCSVM is formulated by the followingoptimization problem :
M inimize Φ( w ) = 12 w T w + 1 υn n (cid:88) i =1 ξ i − ρ (8)ubject to y i ( (cid:104) w, φ ( x i ) (cid:105) + b ) ≥ ρ − ξ i , i = 1 , . . . , nξ i ≥ , i = 1 , . . . , n (9)The parameter υ sets an upper bound on the fraction of out-liers and a lower bound on the number of training examplesused as support vectors. Restricted Boltzmann Machine
A RBM model consists of visible and hidden layers, whichare connected through symmetric weights. The inputs x cor-respond to the neurons in the visible layer. The response ofthe neurons h in the hidden layer model the probability dis-tribution of the inputs. The probability distribution is derivedby learning the symmetrical connecting weights between thevisible and the hidden layers. The neurons in the same layerare not connected. The conditional probability of a configu-ration of the hidden neurons (h), given a configuration of thevisible neurons associated with inputs (x), is: p ( h | x ) = (cid:89) i p ( h i | x ) (10)The objective of the generative training in RBM is to learnthe unknown (h) iteratively using the input (x).The generative training phase iterates until the recon-structed samples most closely approximates x . It is per-formed using the maximum likelihood criterion, and imple-mented by minimizing the negative log probability of thetraining data: L gen = − (cid:88) logP ( x | ( w ij , b i , c j )) (11)where b i and c j are the bias in the input and hidden layers,respectively. w ij denotes the weights between the inputs andhidden layers. Auto-Encoder
An auto-encoder (AE) learns to map from input to outputthrough a pair of encoding and decoding phases. The en-coder maps from the input to hidden layer, the decoder mapsfrom the hidden layers to the output layer to reconstructthe inputs. The hidden layers of the auto-encoder are low-dimensional and nonlinear representation of the input data.The AE is formulated as follows, ˆ X = D ( E ( X )) (12)where X is the input data, E is an encoding map, D is a de-coding map, and ˆ X is the reconstructed input data. The ob-jective of the auto-encoder is to approximate the distributionof X as accurately as possible. In particular, an autoencodercan be viewed as a solution to the following optimizationproblems: min D,E (cid:107) X − D ( E ( X )) (cid:107) (13)where (cid:107) · (cid:107) is usually 2-norm. Complex distributions of X can be modelled using a deep auto-encoder with multiplelayers, which refers to multiple pairs of encoders and de-coders. Generative Adversarial Networks
GAN is a generative model designed by Goodfellow in2014. In a GAN setup, two differentiable functions (genera-tor G and discriminator D ), represented by neural networks,are competing and trained simultaneously, which eventuallydrive the generated samples to be indistinguishable from realdata.The GAN model in this study is based on AnoGAN(Schlegl et al. 2017) recently developed for anomaly detec-tion by T. Schlegl etc. We modified the original AnoGAN bysimultaneously learn an encoder E that maps input samplesx to a latent representation z, along with a generator G anddiscriminator D during training. This enables us to avoid thecomputationally expensive SGD step for recovering a latentrepresentation at test time.After we train the model on the normal data to yield G , D and E for inference, we also define a score function A ( x ) that measures how anomalous an example x is, based ona convex combination of a reconstruction loss L G and adiscriminator-based loss L D : A ( x ) = α ∗ L G ( x ) + (1 − α ) ∗ L D ( x ) (14)where L G ( x ) = || x − G ( E ( x )) || and L D ( x ) = σ ( D ( x, E ( x )) , where α is a weighting parameter ranged in (0 , , σ is thecross-entropy loss from the discriminator of x being a realexample (class ). The definition of L G ( x ) indicates howwell the trained encoder and generator can reconstruct aninput example x . The definition of L D ( x ) captures the dis-criminator confidence that a sample is derived from the realdata distribution. Experimental Results
Data Set and Preprocessing
This public dataset contains credit card transactions madein September 2013 by European cardholders. The transac-tions occurred in two days include 492 fraud records out of284,807 transactions. It is obvious that the dataset is highlyunbalanced (Fig.1). The fraudulent class only accounts for . of all transactions.The dataset contains numerical input variables which arefrom a PCA transformation due to confidentiality issue. Forthe non-numerical features of “Time” and “Amount”, wenormalize them by using RobustScaler which scales thedata according to the quantile range. Specifically for the su-pervised learning models, to tackle the heavily unbalancedproblem, random downsampling is used to avoid the biasresults toward the non-fraudulent class. Through randomdownsampling, non-fraud transactions (Class = 0) are ran-domly reduced to the same amount as fraud transactions(Class = 1), which is equivalent to 492 cases of frauds and492 cases of non-fraud transactions.igure 1: Number of Different Classes Evaluation Metrics
As mentioned above, the studied data set is highly imbal-anced with 492 fraud records out of 284,807 transactions.Even all the samples are classified into non-fraud category,the classification accuracy is still extremely high, that meanstraditional evaluation metrics like accuracy is not suitablefor this study. Instead, we report the Area Under the Re-ceiver Operating Curves (AUROC) () in our experimentalstudy. AUROC combines the false positive rate (FPR) andthe true positive rate (TPR) into one single metric. With theassumption that fraud class is “positive” and non-fraud classis “negative, the definition of FPR and TPR are as follows:
T P R = T P/P and
F P R = F P/N where P and N are the number of samples from positive andnegative classes, respectively. TP (True Positive) representsthe number of samples predicted to be positive while theyare actually positive, and FP (False Positive) the number ofsamples predicted to be positive while they are actually neg-ative.To avoid overfitting issues, in this study, k -fold cross-validation technique is used to estimate fraud detection per-formance. In one round of k-fold cross-validation, the dataset is first randomly divided into k subsets (or folds), whichare of approximately equal size and are mutually exclusive.A machine learning model is then trained and tested k times,where in each time, one of the subsets is set aside as the test-ing data and the remaining k subsets are used as trainingdata. The final testing results are predicted from k trainedsub-models. In our experimental studies, 5 cross validations(i.e., k = 5 ) are used as the validation method. Parameter Settings
The key parameters of most studied models are determinedby grid-search through cross validation, which are listed be-low: • LR: ’C’: 0.1, ’penalty’: ’l1’ • KNN: ’algorithm’: ’auto’, ’n neighbors’: 4 • SVM: ’C’: 0.5, ’kernel’: ’linear’ • DT: ’criterion’: ’entropy’, ’max depth’: 3,’min samples leaf’: 6 • RF: ’n estimators’: 30, ’oob score’: ’True’ • XGB: ’learning rate’: 0.4, ’max depth’: 4 • OCSVM: ’nu’: 0.1, ’gamma’: 0.001 • RBM: ’learning rate’: 0.0005 ’num hidden’: 10While the neural network architectures for Auto-encoderand Generative Adversarial Networks are shown below: • AE: The encoder has two dense layers with 16 and 32Relu units, each. The decoder has two dense layers of 32and 16 Relu units, respectively. • GAN: The encoder has two dense layers with 32 leakyReLu and 32 linear units, each. The generator has threedense layers of 32 ReLu, 64 ReLu and 28 linear units,respectively. And the discriminator has one dense layerof 32 leaky ReLu units followed by one linear layer withsingle unit.
Results
The AUROC values of the 6 supervised models on the stud-ied credit card transaction dataset are shown in Fig.2. It canbe seen that all the models perform well on this data set, withXGB achieves the best performance with AUROC=0.99,while DT obtains the lowest AUROC value of 0.95. It isexpected that the ensemble methods like XGB and RF per-form better than the basic methods like DT. Fig.3 showsthe AUROC values obtained by unsupervised models, withthe RBM, GAN and AE obtain AUROC values above 0.95,while the OC-SVM performs not very well with AUROC= 0.90. Overall, it can be observed that supervised modelsperform slightly better than unsupervised models, at the ex-pense of additional preprocessing procedures like outliersremove.
Discussions
In credit card fraud detection, supervised learning aims totrain a binary classification model to distinguish betweenfraudulent and non-fraudulent instances by feeding labeleddata, while unsupervised learning is intended to model datadistribution of one class and determine whether a test sam-ple belongs to this class or not. In this section, we will dis-cuss the pros and cons of both supervised and unsupervisedlearning.Assuming there are sufficient labeled data, supervisedlearning models, especially for deep neural networks, areable to achieve very promising classification performance.For example, AlexNet (Krizhevsky, Sutskever, and Hinton2012) significantly reduce error rates for image classifica-tion on a large-scale image dataset with more than 1 mil-lion labeled images. However, in credit card fraud detec-tion, the training data in two classes are dramatically im-balanced. The fraudulent transactions are much less than thenon-fraudulent ones. As a result, the trained classifier willbe biased by the majority class whereas it should pay moreattention to the minority one. Another issue for supervisedigure 2: Plot of AUROC by supervised approachesFigure 3: Plot of AUROC by unsupervised approachesearning is that transaction data could only be labeled afterseveral days even a month. This kind of verification latency(Krivko 2010) would yield the delay for updating the su-pervised model. To summarize, the advantage of supervisedlearning is being capable to achieve very promising resultsgiven sufficient training data, while the disadvantage is be-ing dramatically affected by the data imbalance issue and thedata labeling processing.Although unsupervised learning is not so attractive as thesupervised one, it is suitable for credit card fraud detectionas it does not require balanced label data. For example, theAnoGAN model (Schlegl et al. 2017) is able to learn the nor-mal data distribution and indicate whether an unknown testdata is normal or abnormal by using their proposed anomalyscoring scheme. This sort of unsupervised learning modelwould be more prominent if label data is insufficient anddata imbalance is severe. Another advantage for unsuper-vised learning is that a fraudulent credit card use could bedetected promptly because the unsupervised model can beupdated in low latency by using online unlabeled data inbanks and financial institutes. For example, one of unsu-pervised learning models, Self-Organizing Map (SOM) (Za-slavsky and Strizhak 2006), is used to build a frameworkfor unsupervised credit card fraud detection. The proposedautomated system is able to continuously modify the modelby using new added transactions because the SOM modeldoes not require priori information, e.g., whether a trans-action is done by the cardholder or not. In sum, the ad-vantage of unsupervised learning methods are quite obviousfor credit card fraud detection, while the disadvantage maybe the difficulty of making some unsupervised model (e.g.,GAN) converge.
Conclusions
In this paper, we conduct a comparison study for credit cardfraud detection in a supervised vs unsupervised manner byevaluating machine learning models on a Kaggle datasetwith credit card transactions data. The label availability andthe data imbalance restrict the supervised learning perfor-mance dramatically, while the unsupervised one does nothave these bottlenecks. Moreover, some unsupervised learn-ing methods, e.g., GAN, have recently received more atten-tions from the community and also achieved very promisingresults. In futures, we will focus on using GAN models toimprove the performance of credit card fraud detection. eferences [Altman 1992] Altman, N. S. 1992. An introduction tokernel and nearest-neighbor nonparametric regression. TheAmerican Statistician
Decision SupportSystems
Credit Scoring and Credit Control VII
Ma-chine learning
IEEE Intelligent Systems and TheirApplications
R package version 0.4-2
Machine learning
IEEE transactions on neural networks and learn-ing systems
EleventhAnnual Conference of the International Speech Communi-cation Association .[Dorronsoro et al. 1997] Dorronsoro, J. R.; Ginel, F.;S´anchez, C. R.; and Santa Cruz, C. 1997. Neural frauddetection in credit card operations.
IEEE transactions onneural networks .[Fiore et al. 2017] Fiore, U.; De Santis, A.; Perla, F.; Zanetti,P.; and Palmieri, F. 2017. Using generative adversarialnetworks for improving classification effectiveness in creditcard fraud detection.
Information Sciences .[Goodfellow et al. 2014] Goodfellow, I.; Pouget-Abadie, J.;Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville,A.; and Bengio, Y. 2014. Generative adversarial nets. In
Advances in neural information processing systems , 2672–2680.[Jurgovsky et al. 2018] Jurgovsky, J.; Granitzer, M.; Ziegler,K.; Calabretto, S.; Portier, P.; He-Guelton, L.; and Caelen,O. 2018. Sequence classification for credit-card fraud de-tection.
Expert Syst. Appl.
Expert Systems with Applica-tions
Advancesin neural information processing systems , 1097–1105. [Mahmoudi and Duman 2015] Mahmoudi, N., and Duman,E. 2015. Detecting credit card fraud by modified fisherdiscriminant analysis.
Expert Systems with Applications
Applied linear statisticalmodels , volume 4. Irwin Chicago.[Pumsirirat and Yan 2018] Pumsirirat, A., and Yan, L. 2018.Credit card fraud detection using deep learning based onauto-encoder and restricted boltzmann machine.
Interna-tional Journal of Advanced Computer Science and Applica-tions
Machine learning
IEEE ACCESS
Systems and Informa-tion Engineering Design Symposium (SIEDS), 2018 , 129–134. IEEE.[Sahin, Bulkan, and Duman 2013] Sahin, Y.; Bulkan, S.; andDuman, E. 2013. A cost-sensitive decision tree ap-proach for fraud detection.
Expert Systems with Applications
International Confer-ence on Information Processing in Medical Imaging , 146–157. Springer.[Sch¨olkopf et al. 2000] Sch¨olkopf, B.; Williamson, R. C.;Smola, A. J.; Shawe-Taylor, J.; and Platt, J. C. 2000. Sup-port vector method for novelty detection. In
Advances inneural information processing systems , 582–588.[Srivastava et al. 2008] Srivastava, A.; Kundu, A.; Sural, S.;and Majumdar, A. 2008. Credit card fraud detection usinghidden markov model.
IEEE Transactions on dependableand secure computing
Advances in neural informa-tion processing systems , 1601–1608.[Wang et al. 2015] Wang, L.; Liu, T.; Wang, G.; Chan, K. L.;and Yang, Q. 2015. Video tracking using learned hierar-chical features.
IEEE Transactions on Image Processing
DataMining and Knowledge Discovery
ProcediaComputer Science
Information and Security
CoRR abs/1803.01798.[Zojaji et al. 2016] Zojaji, Z.; Atani, R. E.; Monadjemi,A. H.; et al. 2016. A survey of credit card fraud detectiontechniques: data and technique oriented perspective. arXivpreprint arXiv:1611.06439arXivpreprint arXiv:1611.06439