Roberto Saia
University of Cagliari
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Roberto Saia.
intelligent information systems | 2016
Roberto Saia; Ludovico Boratto; Salvatore Carta
Recommender systems usually suggest items by exploiting all the previous interactions of the users with a system (e.g., in order to decide the movies to recommend to a user, all the movies she previously purchased are considered). This canonical approach sometimes could lead to wrong results due to several factors, such as a change in user preferences over time, or the use of her account by third parties. This kind of incoherence in the user profiles defines a lower bound on the error the recommender systems may achieve when they generate suggestions for a user, an aspect known in literature as magic barrier. This paper proposes a novel dynamic coherence-based approach to define the user profile used in the recommendation process. The main aim is to identify and remove, from the previously evaluated items, those not semantically adherent to the others, in order to make a user profile as close as possible to the user’s real preferences, solving the aforementioned problems. Moreover, reshaping the user profile in such a way leads to great advantages in terms of computational complexity, since the number of items considered during the recommendation process is highly reduced. The performed experiments show the effectiveness of our approach to remove the incoherent items from a user profile, increasing the recommendation accuracy.
Knowledge Based Systems | 2016
Ludovico Boratto; Salvatore Carta; Gianni Fenu; Roberto Saia
Modeling user behavior to detect segments of users to target and to whom address ads (behavioral targeting) is a problem widely-studied in the literature. Various sources of data are mined and modeled in order to detect these segments, such as the queries issued by the users. In this paper we first show the need for a user segmentation system to employ reliable user preferences, since nearly half of the times users reformulate their queries in order to satisfy their information need. Then we propose a method that analyzes the description of the items positively evaluated by the users and extracts a vector representation of the words in these descriptions (word embeddings). Since it is widely-known that users tend to choose items of the same categories, our approach is designed to avoid the so-called preference stability, which would associate the users to trivial segments. Moreover, we make sure that the interpretability of the generated segments is a characteristic offered to the advertisers who will use them. We performed different sets of experiments on a large real-world dataset, which validated our approach and showed its capability to produce effective segments.
Neurocomputing | 2017
Ludovico Boratto; Salvatore Carta; Gianni Fenu; Roberto Saia
Abstract Recommender systems suggest items by exploiting the interactions of the users with the system (e.g., the choice of the movies to recommend to a user is based on those she previously evaluated). In particular, content-based systems suggest items whose content is similar to that of the items evaluated by a user. An emerging application domain in content-based recommender systems is represented by the consideration of the semantics behind an item description, in order to have a disambiguation of the words in the description and improve the recommendation accuracy. However, different phenomena, such as changes in the preferences of a user over time or the use of her account by third parties, might affect the accuracy by considering items that do not reflect the actual user preferences. Starting from an analysis of the literature and of an architecture proposed in a recent survey, in this paper we first highlight the current limits in this research area, then we propose design guidelines and an improved architecture to build semantics-aware content-based recommendations.
international congress on big data | 2015
Roberto Saia; Ludovico Boratto; Salvatore Carta
Target definition is a process aimed at partitioning the potential audience of an advertiser into several classes, according to specific criteria. Almost all the existing approaches take into account only the explicit preferences of the users, without considering the hidden semantics embedded in their choices, so the target definition is affected by widely-known problems. One of the most important is that easily understandable segments are not effective for marketing purposes due to their triviality, whereas more complex segmentations are hard to understand. In this paper we propose a novel segmentation strategy able to uncover the implicit preferences of the users, by studying the semantic overlapping between the classes of items positively evaluated by them and the rest of classes. The main advantages of our proposal are that the desired target can be specified by the advertiser, and that the set of users is easily described by the class of items that characterizes them, this means that the complexity of the semantic analysis is hidden to the advertiser, and we obtain an interpretable and non-trivial user segmentation, built by using reliable information. Experimental results confirm the effectiveness of our approach in the generation of the target audience.
Future Generation Computer Systems | 2016
Roberto Saia; Ludovico Boratto; Salvatore Carta; Gianni Fenu
Behavioral targeting is the process of addressing ads to a specific set of users. The set of target users is detected from a segmentation of the user set, based on their interactions with the website (pages visited, items purchased, etc.). Recently, in order to improve the segmentation process, the semantics behind the user behavior has been exploited, by analyzing the queries issued by the users. However, nearly half of the times users need to reformulate their queries in order to satisfy their information need. In this paper, we tackle the problem of semantic behavioral targeting considering reliable user preferences, by performing a semantic analysis on the descriptions of the items positively rated by the users. We also consider widely-known problems, such as the interpretability of a segment, and the fact that user preferences are usually stable over time, which could lead to a trivial segmentation. In order to overcome these issues, our approach allows an advertiser to automatically extract a user segment by specifying the interests that she/he wants to target, by means of a novel boolean algebra; the segments are composed of users whose evaluated items are semantically related to these interests. This leads to interpretable and non-trivial segments, built by using reliable information. Experimental results confirm the effectiveness of our approach at producing users segments. We propose a novel segmentation approach for user targeting, based on a semantic analysis of the items evaluated by a user.Through the semantic analysis we extend the ground truth, to generate non trivial segments.With respect to classic segmentation, the advertiser can introduce constraints and atomically model the user segments.
the internet of things | 2017
Roberto Saia; Salvatore Carta
Nowadays, the prevention of credit card fraud represents a crucial task, since almost all the operators in the E-commerce environment accept payments made through credit cards, aware of that some of them could be fraudulent. The development of approaches able to face effectively this problem represents a hard challenge due to several problems. The most important among them are the heterogeneity and the imbalanced class distribution of data, problems that lead toward a reduction of the effectiveness of the most used techniques, making it difficult to define effective models able to evaluate the new transactions. This paper proposes a new strategy able to face the aforementioned problems based on a model defined by using the Discrete Fourier Transform conversion in order to exploit frequency patterns, instead of the canonical ones, in the evaluation process. Such approach presents some advantages, since it allows us to face the imbalanced class distribution and the cold-start issues by involving only the past legitimate transactions, reducing the data heterogeneity problem thanks to the frequency-domain-based data representation, which results less influenced by the data variation. A practical implementation of the proposed approach is given by presenting an algorithm able to classify a new transaction as reliable or unreliable on the basis of the aforementioned strategy.
science and information conference | 2015
Roberto Saia; Ludovico Boratto; Salvatore Carta
Recommender systems perform suggestions for items that might interest the users. The recommendation process is usually performed at the level of a single item, i.e., for each item not evaluated by a user, classic approaches look for the rating given by similar users for that item, or for an item with similar content. This leads to the so-called overspecialization/serendipity problem, in which the recommended items are trivial and users do not come across surprising items. In this paper we first show that the preferences of the users are actually distributed over a small set of classes of items, leading the recommended items to be too similar to the ones already evaluated. We also present a novel representation model, named Class Path Information (CPI), able to express the current and future preferences of the users in terms of a ranked set of classes of items. Our approach to user preferences modeling is based on a semantic analysis of the items evaluated by the users, in order to extend the ground truth and predict where the future preferences of the users will go. Experimental results show that our approach, by including in the CPI model the same classes predicted by a state-of-the-art recommender system, is able to accurately model the preferences of the users in terms of classes and not in terms of single items, allowing recommender systems to suggest non trivial items.
international joint conference on knowledge discovery knowledge engineering and knowledge management | 2015
Roberto Saia; Ludovico Boratto; Salvatore Carta
The exponential and rapid growth of the E-commerce based both on the new opportunities offered by the Internet, and on the spread of the use of debit or credit cards in the online purchases, has strongly increased the number of frauds, causing large economic losses to the involved businesses. The design of effective strategies able to face this problem is however particularly challenging, due to several factors, such as the heterogeneity and the non-stationary distribution of the data stream, as well as the presence of an imbalanced class distribution. To complicate the problem, there is the scarcity of public datasets for confidentiality issues, which does not allow researchers to verify the new strategies in many data contexts. Differently from the canonical state-of-the-art strategies, instead of defining a unique model based on the past transactions of the users, we follow a Divide and Conquer strategy, by defining multiple models (user behavioral patterns), which we exploit to evaluate a new transaction, in order to detect potential attempts of fraud. We can act on some parameters of this process, in order to adapt the models sensitivity to the operating environment. Considering that our models do not need to be trained with both the past legitimate and fraudulent transactions of a user, since they use only the legitimate ones, we can operate in a proactive manner, by detecting fraudulent transactions that have never occurred in the past. Such a way to proceed also overcomes the data imbalance problem that afflicts the machine learning approaches. The evaluation of the proposed approach is performed by comparing it with one of the most performant approaches at the state of the art as Random Forests, using a real-world credit card dataset.
the internet of things | 2018
Roberto Saia
The problem of frauds is becoming increasingly important in this E-commerce age, where an enormous number of financial transactions are carried out by using electronic instruments of payment such as credit cards. In this scenario it is not possible to adopt human-driven solutions due to the huge number of involved operations. The only approach is therefore to adopt automatic solutions able to discern the legitimate transactions from the fraudulent ones. For this reason, today the development of techniques capable of carrying out this task efficiently represents a very active research field that involves a large number of researchers around the world. Unfortunately, this is not an easy task, since the definition of effective fraud detection approaches is made difficult by a series of well-known problems, the most important of them being the non-balanced class distribution of data that leads towards a significant reduction of the machine learning approaches performance. Such limitation is addressed by the approach proposed in this paper, which exploits three different metrics of similarity in order to define a three-dimensional space of evaluation. Its main objective is a better characterization of the financial transactions in terms of the two possible target classes (legitimate or fraudulent), facing the information asymmetry that gives rise to the problem previously exposed. A series of experiments conducted by using real-world data with different size and imbalance level, demonstrate the effectiveness of the proposed approach with regard to the state-of-the-art solutions.
international conference on digital signal processing | 2018
Roberto Saia; Salvatore Carta; Gianni Fenu
Nowadays, the dramatic growth in consumer credit has made ineffective the methods based on the human intervention, aimed to assess the potential solvency of loan applicants. For this reason, the development of approaches able to automate this operation represents today an active and important research area named Credit Scoring. In such scenario it should be noted how the design of effective approaches represents an hard challenge, due to a series of well-known problems, such as, for instance, the data imbalance, the data heterogeneity, and the cold start. The Centroid wavelet-based approach proposed in this paper faces these issues by moving the data analysis from its canonical domain to a new time-frequency one, where this operation is performed through three different metrics of similarity. Its main objective is to achieve a better characterization of the loan applicants on the basis of the information previously gathered by the Credit Scoring system. The performed experiments demonstrate how such approach outperforms the state-of-the-art solutions.