Florent Masseglia
French Institute for Research in Computer Science and Automation
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Florent Masseglia.
data and knowledge engineering | 2003
Florent Masseglia; Pascal Poncelet; Maguelonne Teisseire
In this paper, we consider the problem of the incremental mining of sequential patterns when new transactions or new customers are added to an original database. We present a new algorithm for mining frequent sequences that uses information collected during an earlier mining process to cut down the cost of finding new sequential patterns in the updated database. Our test shows that the algorithm performs significantly faster than the naive approach of mining the whole updated database from scratch. The difference is so pronounced that this algorithm could also be useful for mining sequential patterns, since in many cases it is faster to apply our algorithm than to mine sequential patterns using a standard algorithm, by breaking down the database into an original database plus an increment.
ACM Sigweb Newsletter | 1999
Florent Masseglia; Pascal Poncelet; Maguelonne Teisseire
With the growing popularity of the World Wide Web (Web), large volumes of data such as user address or URL requested are gathered automatically by Web servers and collected in access log files. Discovering relationships and global patterns that exist in such files can provide significant and useful information for performance enhancement, restructuring a Web site for increased effectiveness, and customer targeting in electronic commerce. In this paper, we propose an integrated system (WebTool) for applying data mining techniques such as association rules or sequential patterns on access log files. Once interesting patterns are discovered, we illustrate how they can be used to customize the server hypertext organization dynamically.
Data Mining and Knowledge Discovery | 2008
Florent Masseglia; Pascal Poncelet; Maguelonne Teisseire; Alice Marascu
Existing Web usage mining techniques are currently based on an arbitrary division of the data (e.g. “one log per month”) or guided by presumed results (e.g. “what is the customers’ behaviour for the period of Christmas purchases?”). These approaches have two main drawbacks. First, they depend on the above-mentioned arbitrary organization of data. Second, they cannot automatically extract “seasonal peaks” from among the stored data. In this paper, we propose a specific data mining process (in particular, to extract frequent behaviour patterns) in order to reveal the densest periods automatically. From the whole set of possible combinations, our method extracts the frequent sequential patterns related to the extracted periods. A period is considered to be dense if it contains at least one frequent sequential pattern for the set of users connected to the website in that period. Our experiments show that the extracted periods are relevant and our approach is able to extract both frequent sequential patterns and the associated dense periods.
intelligent information systems | 2006
Alice Marascu; Florent Masseglia
In recent years, emerging applications introduced new constraints for data mining methods. These constraints are typical of a new kind of data: the data streams. In data stream processing, memory usage is restricted, new elements are generated continuously and have to be considered in a linear time, no blocking operator can be performed and the data can be examined only once. At this time, only a few methods has been proposed for mining sequential patterns in data streams. We argue that the main reason is the combinatory phenomenon related to sequential pattern mining. In this paper, we propose an algorithm based on sequences alignment for mining approximate sequential patterns in Web usage data streams. To meet the constraint of one scan, a greedy clustering algorithm associated to an alignment method is proposed. We will show that our proposal is able to extract relevant sequences with very low thresholds.
Expert Systems With Applications | 2009
Florent Masseglia; Pascal Poncelet; Maguelonne Teisseire
In this paper we consider the problem of discovering sequential patterns by handling time constraints as defined in the Gsp algorithm. While sequential patterns could be seen as temporal relationships between facts embedded in the database where considered facts are merely characteristics of individuals or observations of individual behavior, generalized sequential patterns aim to provide the end user with a more flexible handling of the transactions embedded in the database. We thus propose a new efficient algorithm, called Gtc (Graph for Time Constraints) for mining such patterns in very large databases. It is based on the idea that handling time constraints in the earlier stage of the data mining process can be highly beneficial. One of the most significant new feature of our approach is that handling of time constraints can be easily taken into account in traditional levelwise approaches since it is carried out prior to and separately from the counting step of a data sequence. Our test shows that the proposed algorithm performs significantly faster than a state-of-the-art sequence mining algorithm.
Archive | 2007
Pascal Poncelet; Florent Masseglia; Maguelonne Teisseire
The problem of mining patterns is becoming a very active research area and efficient techniques have been widely applied to problems in industry, government, and science. From the initial definition and motivated by real-applications, the problem of mining patterns not only addresses the finding of itemsets but also more and more complex patterns. Successes and New Directions in Data Mining addresses existing solutions for data mining, with particular emphasis on potential real-world applications. Capturing defining research on topics such as fuzzy set theory, clustering algorithms, semi-supervised clustering, modeling and managing data mining patterns, and sequence motif mining, this book is an indispensable resource for library collections.
international symposium on temporal representation and reasoning | 2004
Florent Masseglia; Pascal Poncelet; Maguelonne Teisseire
In this paper we consider the problem of discovering sequential patterns by handling time constraints. While sequential patterns could be seen as temporal relationships between facts embedded in the database, generalized sequential patterns aim at providing the end user with a more flexible handling of the transactions embedded in the database. We propose a new efficient algorithm, called GTC (graph for time constraints) for mining such patterns in very large databases. It is based on the idea that handling time constraints in the earlier stage of the algorithm can be highly beneficial since it minimizes computational costs by preprocessing data sequences. Our test shows that the proposed algorithm performs significantly faster than a state-of-the-art sequence mining algorithm.
Knowledge and Information Systems | 2011
Bashar Saleh; Florent Masseglia
One of the most popular problems in usage mining is the discovery of frequent behaviors. It relies on the extraction of frequent itemsets from usage databases. However, those databases are usually considered as a whole, and therefore, itemsets are extracted over the entire set of records. Our claim is that possible subsets, hidden within the structure of the data and containing relevant itemsets, may exist. These subsets, as well as the itemsets they contain, depend on the context. Time is an essential element of the context. The users’ intents will differ from one period to another. Behaviors over Christmas will be different from those extracted during the summer. Unfortunately, these periods might be lost because of arbitrary divisions of the data. The goal of our work is to find itemsets that are frequent over a specific period, but would not be extracted by traditional methods since their support is very low over the whole dataset. We introduce the definition of solid itemsets, which represent coherent and compact behaviors over specific periods, and we propose Sim, an algorithm for their extraction.
Archive | 2007
Pascal Poncelet; Florent Masseglia; Maguelonne Teisseire
Since the introduction of the Apriori algorithm a decade ago, the problem of mining patterns is becoming a very active research area, and efficient techniques have been widely applied to the problems either in industry or science. Currently, the data mining community is focusing on new problems such as: mining new kinds of patterns, mining patterns under constraints, considering new kinds of complex data, and real-world applications of these concepts. Data Mining Patterns: New Methods and Applications provides an overall view of the recent solutions for mining, and also explores new kinds of patterns. This book offers theoretical frameworks and presents challenges and their possible solutions concerning pattern extractions, emphasizing both research techniques and real-world applications. Data Mining Patterns: New Methods and Applications portrays research applications in data models, techniques and methodologies for mining patterns, multi-relational and multidimensional pattern mining, fuzzy data mining, data streaming, incremental mining, and many other topics.
asia-pacific web conference | 2004
Florent Masseglia; Doru Tanasa; Brigitte Trousse
The goal of this work is to increase the relevance and the interestingness of patterns discovered by a Web Usage Mining process. Indeed, the sequential patterns extracted on web log files, unless they are found under constraints, often lack interest because of their obvious content. Our goal is to discover minority users’ behaviors having a coherence which we want to be aware of (like hacking activities on the Web site or a users’ activity limited to a specific part of the Web site). By means of a clustering method on the extracted sequential patterns, we propose a recursive division of the problem. The developed clustering method is based on patterns summaries and neural networks. Our experiments show that we obtain the targeted patterns whereas their extraction by means of a classical process is impossible because of a very weak support (down to 0.006%). The diversity of users’ behaviors is so large that the minority ones are both numerous and difficult to locate.