Alejandro Correa Bahnsen

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alejandro Correa Bahnsen is active.

Explore More

Publication

Featured researches published by Alejandro Correa Bahnsen.

Expert Systems With Applications | 2015

Example-dependent cost-sensitive decision trees

Alejandro Correa Bahnsen; Djamila Aouada; Björn E. Ottersten

Example-dependent cost-sensitive tree algorithm.Each example is assumed to have different financial cost.Application on credit card fraud detection, credit scoring and direct marketing.Focus on maximizing the financial savings instead of accuracy.Code is open source and available at albahnsen.com/CostSensitiveClassification. Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples. However, standard classification methods do not take these costs into account, and assume a constant cost of misclassification errors. State-of-the-art example-dependent cost-sensitive techniques only introduce the cost to the algorithm, either before or after training, therefore, leaving opportunities to investigate the potential impact of algorithms that take into account the real financial example-dependent costs during an algorithm training. In this paper, we propose an example-dependent cost-sensitive decision tree algorithm, by incorporating the different example-dependent costs into a new cost-based impurity measure and a new cost-based pruning criteria. Then, using three different databases, from three real-world applications: credit card fraud detection, credit scoring and direct marketing, we evaluate the proposed method. The results show that the proposed algorithm is the best performing method for all databases. Furthermore, when compared against a standard decision tree, our method builds significantly smaller trees in only a fifth of the time, while having a superior performance measured by cost savings, leading to a method that not only has more business-oriented results, but also a method that creates simpler models that are easier to analyze.

international conference on machine learning and applications | 2013

Cost Sensitive Credit Card Fraud Detection Using Bayes Minimum Risk

Alejandro Correa Bahnsen; Aleksandar Stojanovic; Djamila Aouada; Björn E. Ottersten

Credit card fraud is a growing problem that affects card holders around the world. Fraud detection has been an interesting topic in machine learning. Nevertheless, current state of the art credit card fraud detection algorithms miss to include the real costs of credit card fraud as a measure to evaluate algorithms. In this paper a new comparison measure that realistically represents the monetary gains and losses due to fraud detection is proposed. Moreover, using the proposed cost measure a cost sensitive method based on Bayes minimum risk is presented. This method is compared with state of the art algorithms and shows improvements up to 23% measured by cost. The results of this paper are based on real life transactional data provided by a large European card processing company.

international conference on machine learning and applications | 2014

Example-Dependent Cost-Sensitive Logistic Regression for Credit Scoring

Alejandro Correa Bahnsen; Djamila Aouada; Björn E. Ottersten

Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples. Credit scoring is a typical example of cost-sensitive classification. However, it is usually treated using methods that do not take into account the real financial costs associated with the lending business. In this paper, we propose a new example-dependent cost matrix for credit scoring. Furthermore, we propose an algorithm that introduces the example-dependent costs into a logistic regression. Using two publicly available datasets, we compare our proposed method against state-of-the-art example-dependent cost-sensitive algorithms. The results highlight the importance of using real financial costs. Moreover, by using the proposed cost-sensitive logistic regression, significant improvements are made in the sense of higher savings.

Expert Systems With Applications | 2016

Feature engineering strategies for credit card fraud detection

Alejandro Correa Bahnsen; Djamila Aouada; Aleksandar Stojanovic; Björn E. Ottersten

Credit card fraud detection evaluation measure.Each example is assumed to have different financial cost.Transaction aggregation strategy for predicting fraud.Periodic features using the von Mises distribution.Code is open source and available at albahnsen.com/CostSensitiveClassification. Every year billions of Euros are lost worldwide due to credit card fraud. Thus, forcing financial institutions to continuously improve their fraud detection systems. In recent years, several studies have proposed the use of machine learning and data mining techniques to address this problem. However, most studies used some sort of misclassification measure to evaluate the different solutions, and do not take into account the actual financial costs associated with the fraud detection process. Moreover, when constructing a credit card fraud detection model, it is very important how to extract the right features from the transactional data. This is usually done by aggregating the transactions in order to observe the spending behavioral patterns of the customers. In this paper we expand the transaction aggregation strategy, and propose to create a new set of features based on analyzing the periodic behavior of the time of a transaction using the von Mises distribution. Then, using a real credit card fraud dataset provided by a large European card processing company, we compare state-of-the-art credit card fraud detection models, and evaluate how the different sets of features have an impact on the results. By including the proposed periodic features into the methods, the results show an average increase in savings of 13%.

international conference on machine learning and applications | 2015

Detecting Credit Card Fraud Using Periodic Features

Alejandro Correa Bahnsen; Djamila Aouada; Aleksandar Stojanovic; Björn E. Ottersten

When constructing a credit card fraud detection model, it is very important to extract the right features from transactional data. This is usually done by aggregating the transactions in order to observe the spending behavioral patterns of the customers. In this paper we propose to create a new set of features based on analyzing the periodic behavior of the time of a transaction using the von Mises distribution. Using a real credit card fraud dataset provided by a large European card processing company, we compare state-of-the-art credit card fraud detection models, and evaluate how the different sets of features have an impact on the results. By including the proposed periodic features into the methods, the results show an average increase in savings of 13%. The aforementioned card processing company is currently incorporating the methodology proposed in this paper into their fraud detection system.

2016 APWG Symposium on Electronic Crime Research (eCrime) | 2016

Knowing your enemies: leveraging data analysis to expose phishing patterns against a major US financial institution

Javier Vargas; Alejandro Correa Bahnsen; Sergio Villegas; Daniel Ingevaldson

Phishing attacks against financial institutions constitutes a major concern and forces them to invest thousands of dollars annually in prevention, detection and takedown of these kinds of attacks. This operation is so massive and time critical that there is usually no time to perform analysis to look for patterns and correlations between attacks. In this work we summarize our findings after applying data analysis and clustering analysis to the record of attacks registered for a major financial institution in the US. We use HTML structure and content analysis, as well as domain registration records and DNS RRSets information of the sites, in order to look for patterns and correlations between phishing attacks. It is shown that by understanding and clustering the different types of phishing sites, we are able to identify different strategies used by criminal organizations. Furthermore, the findings of this study provide us valuable insight into who is targeting the institution and their modus operandi, which gives us a solid foundation for the construction of more and better tools for detection and takedown, and eventually for forensic analysts who will be able to correlate cases and perform focused searches that speed up their investigations.

2017 APWG Symposium on Electronic Crime Research (eCrime) | 2017

Classifying phishing URLs using recurrent neural networks

Alejandro Correa Bahnsen; Eduardo Contreras Bohorquez; Sergio Villegas; Javier Vargas; Fabio A. González

As the technical skills and costs associated with the deployment of phishing attacks decrease, we are witnessing an unprecedented level of scams that push the need for better methods to proactively detect phishing threats. In this work, we explored the use of URLs as input for machine learning models applied for phishing site prediction. In this way, we compared a feature-engineering approach followed by a random forest classifier against a novel method based on recurrent neural networks. We determined that the recurrent neural network approach provides an accuracy rate of 98.7% even without the need of manual feature creation, beating by 5% the random forest method. This means it is a scalable and fast-acting proactive detection system that does not require full content analysis.

siam international conference on data mining | 2014