Rosa Meo
University of Turin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rosa Meo.
Data Mining and Knowledge Discovery | 1998
Rosa Meo; Giuseppe Psaila; Stefano Ceri
Data mining evolved as a collection of applicative problems and efficient solution algorithms relative to rather peculiar problems, all focused on the discovery of relevant information hidden in databases of huge dimensions. In particular, one of the most investigated topics is the discovery of association rules.This work proposes a unifying model that enables a uniform description of the problem of discovering association rules. The model provides a SQL-like operator, named X⇒Y, which is capable of expressing all the problems presented so far in the literature concerning the mining of association rules. We demonstrate the expressive power of the new operator by means of several examples, some of which are classical, while some others are fully original and correspond to novel and unusual applications. We also present the operational semantics of the operator by means of an extended relational algebra.
international world wide web conferences | 1999
Liliana Ardissono; Anna Goy; Rosa Meo; Giovanna Petrone; Luca Console; Leonardo Lesmo; Carla Simone; Pietro Torasso
With the recent expansion of the Internet, the interest towards electronic sales has quickly grown and many tools have been built to help vendors to set up their Web stores. These tools offer all the facilities for building the store databases and managing the order processing and secure payment transactions, but they typically do not focus on issues like the personalization of the interaction with the customers. However, Web surfers are generally heterogeneous and have different needs and preferences; moreover, the trend of marketing strategies is to pay more and more attention to the specific buyers. So, the importance of personalizing the interaction with the user and the product presentation is increasing. In this paper, we describe the architecture of a configurable virtual Web store supporting personalized hypertextual interactions with users. Our system builds a user profile by applying user modeling techniques and stereotypical information about the characteristics of customer groups; this profile is used during the interaction in order to tailor the product descriptions and the selection of items to recommend to the users needs, varying the layout of the hypertextual pages and the detail of the descriptions accordingly. Tailoring the systems behavior requires the parallel execution of several complex tasks during the interaction (e.g., identifying the users preferences, selecting the products most suited to her, dynamically generating the hypertextual pages). Therefore, we have defined a multiagent architecture where these tasks are executed by different agents, which cooperate offering specific services to each other. In our system, the domain‐dependent knowledge, concerning information about products and customer features, is declaratively represented and clearly separated from the domain‐independent components, which represent the core of the virtual store. This separation has the advantage that our architecture can be easily instantiated on several sales domains, therefore obtaining different Web stores out of a single shell. Our system is developed in a Java‐based environment and the overall architecture includes the prototype of a virtual store and the configuration tools which can be used to set up a new store on a specific sales domain.
ACM Transactions on Knowledge Discovery From Data | 2012
Dino Ienco; Ruggero G. Pensa; Rosa Meo
Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance between pairs of values of a categorical attribute, since the values are not ordered. In this article, we propose a framework to learn a context-based distance for categorical attributes. The key intuition of this work is that the distance between two values of a categorical attribute Ai can be determined by the way in which the values of the other attributes Aj are distributed in the dataset objects: if they are similarly distributed in the groups of objects in correspondence of the distinct values of Ai a low value of distance is obtained. We propose also a solution to the critical point of the choice of the attributes Aj. We validate our approach by embedding our distance learning framework in a hierarchical clustering algorithm. We applied it on various real world and synthetic datasets, both low and high-dimensional. Experimental results show that our method is competitive with respect to the state of the art of categorical data clustering approaches. We also show that our approach is scalable and has a low impact on the overall computational time of a clustering task.
intelligent data analysis | 2009
Dino Ienco; Ruggero G. Pensa; Rosa Meo
Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance between pairs of values of the same categorical attribute, since they are not ordered. In this paper, we propose a method to learn a context-based distance for categorical attributes. The key intuition of this work is that the distance between two values of a categorical attribute A i can be determined by the way in which the values of the other attributes A j are distributed in the dataset objects: if they are similarly distributed in the groups of objects in correspondence of the distinct values of A i a low value of distance is obtained. We propose also a solution to the critical point of the choice of the attributes A j . We validate our approach on various real world and synthetic datasets, by embedding our distance learning method in both a partitional and a hierarchical clustering algorithm. Experimental results show that our method is competitive w.r.t. categorical data clustering approaches in the state of the art.
ACM Transactions on Database Systems | 2000
Rosa Meo
A new model to evaluate dependencies in data mining problems is presented and discussed. The well-known concept of the association rule is replaced by the new definition of dependence value, which is a single real number uniquely associated with a given itemset. Knowledge of dependence values is sufficient to describe all the dependencies characterizing a given data mining problem. The dependence value of an itemset is the difference between the occurrence probability of the itemset and a corresponding “maximum independence estimate.” This can be determined as a function of joint probabilities of the subsets of the itemset being considered by maximizing a suitable entropy function. So it is possible to separate in an itemset of cardinaltiy k the dependence inherited from its subsets of cardinality (k − 1) and the specific inherent dependence of that itemset. The absolute value of the difference between the probability p(i) of the event i that indicates the prescence of the itemset {a,b,... } and its maximum independence estimate is constant for any combination of values of Q &angl0; a,b,... &angr0; Q. In1paddition, the Boolean function specifying the combination of values for which the dependence is positive is a parity function. So the determination of such combinations is immediate. The model appears to be simple and powerful.
international conference on data engineering | 1998
Rosa Meo; Giuseppe Psaila; Stefano Ceri
Current approaches to data mining are based on the use of a decoupled architecture, where data are first extracted from a database and then processed by a specialized data mining engine. This paper proposes instead a tightly-coupled architecture, where data mining is integrated within a classical SQL server. The premise of this work is a SQL-like operator, called MINE RULE. We show how the various syntactic features of the operator can be managed by either a SQL engine or a classical data mining engine; our main objective is to identify the border between typical relational processing, executed by the relational server, and data mining processing, executed by a specialized component. The resulting architecture exhibits portability at the SQL level and integration of inputs and outputs of the data mining operator with the database, and provides the guidelines for promoting the integration of other data mining techniques and systems with SQL servers.
extending database technology | 1996
Rosa Meo; Giuseppe Psaila; Stefano Ceri
In this paper, we extend event types supported by Chimera, an active object-oriented database system. Chimera rules currently support disjunctive expressions of set-oriented, elementary event types; our proposal introduces instance-oriented event types, arbitrary boolean expressions (including negation), and precedence operators. Thus, we introduce a new event calculus, whose distinguishing feature is to support a minimal set of orthogonal operators which can be arbitrarily composed. We use event calculus to determine when rules are triggered; this is a change of each rules internal status which makes it suitable for being considered by the rule selection mechanism.
Data Mining and Knowledge Discovery | 2013
Dino Ienco; Céline Robardet; Ruggero G. Pensa; Rosa Meo
The availability of data represented with multiple features coming from heterogeneous domains is getting more and more common in real world applications. Such data represent objects of a certain type, connected to other types of data, the features, so that the overall data schema forms a star structure of inter-relationships. Co-clustering these data involves the specification of many parameters, such as the number of clusters for the object dimension and for all the features domains. In this paper we present a novel co-clustering algorithm for heterogeneous star-structured data that is parameter-less. This means that it does not require either the number of row clusters or the number of column clusters for the given feature spaces. Our approach optimizes the Goodman–Kruskal’s τ, a measure for cross-association in contingency tables that evaluates the strength of the relationship between two categorical variables. We extend τ to evaluate co-clustering solutions and in particular we apply it in a higher dimensional setting. We propose the algorithm CoStar which optimizes τ by a local search approach. We assess the performance of CoStar on publicly available datasets from the textual and image domains using objective external criteria. The results show that our approach outperforms state-of-the-art methods for the co-clustering of heterogeneous data, while it remains computationally efficient.
Lecture Notes in Computer Science | 2004
Marco Botta; Jean-François Boulicaut; Cyrille Masson; Rosa Meo
Recently, inductive databases (IDBs) have been proposed to tackle the problem of knowledge discovery from huge databases. With an IDB, the user/analyst performs a set of very different operations on data using a query language, powerful enough to support all the required manipulations, such as data preprocessing, pattern discovery and pattern post-processing. We provide a comparison between three query languages (MSQL, DMQL and MINE RULE) that have been proposed for descriptive rule mining and discuss their common features and differences. These query languages look like extensions of SQL. We present them using a set of examples, taken from the real practice of rule mining. In the paper we discuss also OLE DB for Data Mining and Predictive Model Markup Language, two recent proposals that like the first three query languages respectively provide native support to data mining primitives and provide a description in a standard language of statistical and data mining models.
web mining and web usage analysis | 2004
Rosa Meo; Pier Luca Lanzi; Maristella Matera; Roberto Esposito
We present a case study about the application of the inductive database approach to the analysis of Web logs. We consider rich XML Web logs – called conceptual logs – that are generated by Web applications designed with the WebML conceptual model and developed with the WebRatio CASE tool. Conceptual logs integrate the usual information about user requests with meta-data concerning the structure of the content and the hypertext of a Web application. We apply a data mining language (MINE RULE) to conceptual logs in order to identify different types of patterns, such as: recurrent navigation paths, most frequently visited page contents, and anomalies (e.g., intrusion attempts or harmful usages of resources). We show that the exploitation of the nuggets of information embedded in the logs and of the specialized mining constructs provided by the query languages enables the rapid customization of the mining procedures following to the Web developers’ need. Given our on-field experience, we also suggest that the use of queries in advanced languages, as opposed to ad-hoc heuristics, eases the specification and the discovery of large spectrum of patterns.