Chedy Raïssi
French Institute for Research in Computer Science and Automation
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chedy Raïssi.
very large data bases | 2010
Jianneng Cao; Panagiotis Karras; Chedy Raïssi; Kian-Lee Tan
The publication of transaction data, such as market basket data, medical records, and query logs, serves the public benefit. Mining such data allows for the derivation of association rules that connect certain items to others with measurable confidence. Still, this type of data analysis poses a privacy threat; an adversary having partial information on a persons behavior may confidently associate that person to an item deemed to be sensitive. Ideally, an anonymization of such data should lead to an inference-proof version that prevents the association of individuals to sensitive items, while otherwise allowing for truthful associations to be derived. Original approaches to this problem were based on value perturbation, damaging data integrity. Recently, value generalization has been proposed as an alternative; still, approaches based on it have assumed either that all items are equally sensitive, or that some are sensitive and can be known to an adversary only by association, while others are non-sensitive and can be known directly. Yet in reality there is a distinction between sensitive and non-sensitive items, but an adversary may possess information on any of them. Most critically, no antecedent method aims at a clear inference-proof privacy guarantee. In this paper, we propose ρ-uncertainty, the first, to our knowledge, privacy concept that inherently safeguards against sensitive associations without constraining the nature of an adversarys knowledge and without falsifying data. The problem of achieving ρ-uncertainty with low information loss is challenging because it is natural. A trivial solution is to suppress all sensitive items. We develop more sophisticated schemes. In a broad experimental study, we show that the problem is solved non-trivially by a technique that combines generalization and suppression, which also achieves favorable results compared to a baseline perturbation-based scheme.The publication of transaction data, such as market basket data, medical records, and query logs, serves the public benefit. Mining such data allows for the derivation of association rules that connect certain items to others with measurable confidence. Still, this type of data analysis poses a privacy threat; an adversary having partial information on a persons behavior may confidently associate that person to an item deemed to be sensitive. Ideally, an anonymization of such data should lead to an inference-proof version that prevents the association of individuals to sensitive items, while otherwise allowing for truthful associations to be derived. Original approaches to this problem were based on value perturbation, damaging data integrity. Recently, value generalization has been proposed as an alternative; still, approaches based on it have assumed either that all items are equally sensitive, or that some are sensitive and can be known to an adversary only by association, while others are non-sensitive and can be known directly. Yet in reality there is a distinction between sensitive and non-sensitive items, but an adversary may possess information on any of them. Most critically, no antecedent method aims at a clear inference-proof privacy guarantee. In this paper, we propose ρ-uncertainty, the first, to our knowledge, privacy concept that inherently safeguards against sensitive associations without constraining the nature of an adversarys knowledge and without falsifying data. The problem of achieving ρ-uncertainty with low information loss is challenging because it is natural. A trivial solution is to suppress all sensitive items. We develop more sophisticated schemes. In a broad experimental study, we show that the problem is solved non-trivially by a technique that combines generalization and suppression, which also achieves favorable results compared to a baseline perturbation-based scheme.
international conference on data mining | 2011
Arnaud Soulet; Chedy Raïssi; Marc Plantevit; Bruno Crémilleux
Pattern discovery is at the core of numerous data mining tasks. Although many methods focus on efficiency in pattern mining, they still suffer from the problem of choosing a threshold that influences the final extraction result. The goal of our study is to make the results of pattern mining useful from a user-preference point of view. To this end, we integrate into the pattern discovery process the idea of skyline queries in order to mine skyline patterns in a threshold-free manner. Because the skyline patterns satisfy a formal property of dominations, they not only have a global interest but also have semantics that are easily understood by the user. In this work, we first establish theoretical relationships between pattern condensed representations and skyline pattern mining. We also show that it is possible to compute automatically a subset of measures involved in the user query which allows the patterns to be condensed and thus facilitates the computation of the skyline patterns. This forms the basis for a novel approach to mining skyline patterns. We illustrate the efficiency of our approach over several data sets including a use case from chemo informatics and show that small sets of dominant patterns are produced under various measures.
very large data bases | 2010
Chedy Raïssi; Jian Pei; Thomas Kister
In this paper, we tackle the problem of efficient skycube computation. We introduce a novel approach significantly reducing domination tests for a given subspace and the number of subspaces searched. Technically, we identify two types of skyline points that can be directly derived without using any domination tests. Moreover, based on formal concept analysis, we introduce two closure operators that enable a concise representation of skyline cubes. We show that this concise representation is easy to compute and develop an efficient algorithm, which only needs to search a small portion of the huge search space. We show with empirical results the merits of our approach.
knowledge discovery and data mining | 2012
Mingqiang Xue; Panagiotis Karras; Chedy Raïssi; Jaideep Vaidya; Kian-Lee Tan
Today there is a strong interest in publishing set-valued data in a privacy-preserving manner. Such data associate individuals to sets of values (e.g., preferences, shopping items, symptoms, query logs). In addition, an individual can be associated with a sensitive label (e.g., marital status, religious or political conviction). Anonymizing such data implies ensuring that an adversary should not be able to (1) identify an individuals record, and (2) infer a sensitive label, if such exists. Existing research on this problem either perturbs the data, publishes them in disjoint groups disassociated from their sensitive labels, or generalizes their values by assuming the availability of a generalization hierarchy. In this paper, we propose a novel alternative. Our publication method also puts data in a generalized form, but does not require that published records form disjoint groups and does not assume a hierarchy either; instead, it employs generalized bitmaps and recasts data values in a nonreciprocal manner; formally, the bipartite graph from original to anonymized records does not have to be composed of disjoint complete subgraphs. We configure our schemes to provide popular privacy guarantees while resisting attacks proposed in recent research, and demonstrate experimentally that we gain a clear utility advantage over the previous state of the art.
Data Mining and Knowledge Discovery | 2008
Chedy Raïssi; Toon Calders; Pascal Poncelet
In this paper we aim at extending the non-derivable condensed representation in frequent itemset mining to sequential pattern mining. We start by showing a negative example: in the context of frequent sequences, the notion of non-derivability is meaningless. Therefore, we extend our focus to the mining of conjunctions of sequences. Besides of being of practical importance, this class of patterns has some nice theoretical properties. Based on a new unexploited theoretical definition of equivalence classes for sequential patterns, we are able to extend the notion of a non-derivable itemset to the sequence domain. We present a new depth-first approach to mine non-derivable conjunctive sequential patterns and show its use in mining association rules for sequences. This approach is based on a well known combinatorial theorem: the Möbius inversion. A performance study using both synthetic and real datasets illustrates the efficiency of our mining algorithm. These new introduced patterns have a high-potential for real-life applications, especially for network monitoring and biomedical fields with the ability to get sequential association rules with all the classical statistical metrics such as confidence, conviction, lift etc.
intelligent information systems | 2007
Chedy Raïssi; Pascal Poncelet; Maguelonne Teisseire
Mining frequent patterns on streaming data is a new challenging problem for the data mining community since data arrives sequentially in the form of continuous rapid streams. In this paper we propose a new approach for mining itemsets. Our approach has the following advantages: an efficient representation of items and a novel data structure to maintain frequent patterns coupled with a fast pruning strategy. At any time, users can issue requests for frequent itemsets over an arbitrary time interval. Furthermore our approach produces an approximate answer with an assurance that it will not bypass user-defined frequency and temporal thresholds. Finally the proposed method is analyzed by a series of experiments on different datasets.
international conference on data mining | 2013
Cécile Low-Kam; Chedy Raïssi; Mehdi Kaytoue; Jian Pei
Recent developments in the frequent pattern mining framework uses additional measures of interest to reduce the set of discovered patterns. We introduce a rigorous and efficient approach to mine statistically significant, unexpected patterns in sequences of item sets. The proposed methodology is based on a null model for sequences and on a multiple testing procedure to extract patterns of interest. Experiments on sequences of replays of a video game demonstrate the scalability and the efficiency of the method to discover unexpected game strategies.
data warehousing and knowledge discovery | 2008
Chedy Raïssi; Marc Plantevit
Sequential pattern mining is an active field in the domain of knowledge discovery and has been widely studied for over a decade by data mining researchers. More and more, with the constant progress in hardware and software technologies, real-world applications like network monitoring systems or sensor grids generate huge amount of streaming data. This new data model, seen as a potentially infinite and unbounded flow, calls for new real-time sequence mining algorithms that can handle large volume of information with minimal scans. However, current sequence mining approaches fail to take into account the inherent multidimensionality of the streams and all algorithms merely mine correlations between events among only one dimension. Therefore, in this paper, we propose to take multidimensional framework into account in order to detect high-level changes like trends. We show that multidimensional sequential pattern mining over data streams can help detecting interesting high-level variations. We demonstrate with empirical results that our approach is able to extract multidimensional sequential patterns with an approximate support guarantee over data streams.
international conference on data mining | 2007
Chedy Raïssi; Pascal Poncelet
Sequential pattern mining is an active field in the domain of knowledge discovery. Recently, with the constant progress in hardware technologies, real-world databases tend to grow larger and the hypothesis that a database can be loaded into main-memory for sequential pattern mining purpose is no longer valid. Furthermore, the new model of data as a continuous and potentially infinite flow, known as data stream model, call for a pre-processing step to ease the mining operations. Since the database size is the most influential factor for mining algorithms we examine the use of sampling over static databases to get approximate mining results with an upper bound on the error rate. Moreover, we extend these sampling analysis and present an algorithm based on reservoir sampling to cope with sequential pattern mining over data streams. We demonstrate with empirical results that our sampling methods are efficient and that sequence mining remains accurate over static databases and data streams.
latin american web congress | 2014
Gustavo Nascimento; Manoel Horta Ribeiro; Loïc Cerf; Natália Cesário; Mehdi Kaytoue; Chedy Raïssi; Thiago Vasconcelos; Wagner Meira
In parallel to the exponential growth of the gaming industry, video game live-streaming is rising as a major form of online entertainment. Gathering a heterogeneous community, the popularity of this new media led to the creation of web services just for streaming video games, such as Twitch. TV. In this paper, we propose a model to characterize how streamers and spectators behave, based on their possible actions in Twitch and, using it, we perform a case study on the Star craft II streamers and spectators. In the case study we analyze a large amount of data collected in Twitch. TVs chat in order to better understand how streamers behave, and how this new form of online entertainment is different from previous ones. Based on this analysis, we were able to better understand channel switching, channel surfing, and to create a model for predicting the number of chat messages based on the number of spectators. We were also able to describe behavioral patterns, such as the mass evasion of spectators before the end of a streaming section in a channel.