Is this you? Create Your Porfile

Riccardo Ortale

Indian Council of Agricultural Research

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Riccardo Ortale is active.

Explore More

Publication

Featured researches published by Riccardo Ortale.

european conference on principles of data mining and knowledge discovery | 2004

A tree-based approach to clustering XML documents by structure

Gianni Costa; Giuseppe Manco; Riccardo Ortale; Andrea Tagarelli

We propose a novel methodology for clustering XML documents on the basis of their structural similarities. The idea is to equip each cluster with an XML cluster representative, i.e. an XML document subsuming the most typical structural specifics of a set of XML documents. Clustering is essentially accomplished by comparing cluster representatives, and updating the representatives as soon as new clusters are detected. We present an algorithm for the computation of an XML representative based on suitable techniques for identifying significant node matchings and for reliably merging and pruning XML trees. Experimental evaluation performed on both synthetic and real data shows the effectiveness of our approach.

IEEE Transactions on Knowledge and Data Engineering | 2007

Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data

Eugenio Cesario; Giuseppe Manco; Riccardo Ortale

A parameter-free, fully-automatic approach to clustering high-dimensional categorical data is proposed. The technique is based on a two-phase iterative procedure, which attempts to improve the overall quality of the whole partition. In the first phase, cluster assignments are given, and a new cluster is added to the partition by identifying and splitting a low-quality cluster. In the second phase, the number of clusters is fixed, and an attempt to optimize cluster assignments is done. On the basis of such features, the algorithm attempts to improve the overall quality of the whole partition and finds clusters in the data, whose number is naturally established on the basis of the inherent features of the underlying data set rather than being previously specified. Furthermore, the approach is parametric to the notion of cluster quality: Here, a cluster is defined as a set of tuples exhibiting a sort of homogeneity. We show how a suitable notion of cluster homogeneity can be defined in the context of high-dimensional categorical data, from which an effective instance of the proposed clustering scheme immediately follows. Experiments on both synthetic and real data prove that the devised algorithm scales linearly and achieves nearly optimal results in terms of compactness and separation.

Data Mining and Knowledge Discovery | 2010

An incremental clustering scheme for data de-duplication

Gianni Costa; Giuseppe Manco; Riccardo Ortale

We propose an incremental technique for discovering duplicates in large databases of textual sequences, i.e., syntactically different tuples, that refer to the same real-world entity. The problem is approached from a clustering perspective: given a set of tuples, the objective is to partition them into groups of duplicate tuples. Each newly arrived tuple is assigned to an appropriate cluster via nearest-neighbor classification. This is achieved by means of a suitable hash-based index, that maps any tuple to a set of indexing keys and assigns tuples with high syntactic similarity to the same buckets. Hence, the neighbors of a query tuple can be efficiently identified by simply retrieving those tuples that appear in the same buckets associated to the query tuple itself, without completely scanning the original database. Two alternative schemes for computing indexing keys are discussed and compared. An extensive experimental evaluation on both synthetic and real data shows the effectiveness of our approach.

advances in social networks analysis and mining | 2012

A Bayesian Hierarchical Approach for Exploratory Analysis of Communities and Roles in Social Networks

Gianni Costa; Riccardo Ortale

We present a new probabilistic approach to modeling social interactions, that seamlessly integrates community discovery and role assignment for a deeper understanding of connectivity patterns in social networks. The devised approach is an unsupervised learning technique based on a Bayesian hierarchical model of social interactions. This model specifies an intuitive generative process, in which pairs of nodes in a social network are associated with communities as well as roles in the context of the respective communities, before that a directed interaction is possibly established between them. According to the generative semantics of the proposed model, nodes are represented as probability distributions over communities, while communities are represented as probability distributions over roles. Such distributions are unknown parameters of the proposed model, that are estimated from social-network data through approximated posterior inference and parameter estimation. A comparative evaluation over real-world social networks reveals that our approach outperforms state-of-the-art competitors in terms of link prediction.

advances in geographic information systems | 2008

The DAEDALUS framework: progressive querying and mining of movement data

Riccardo Ortale; Ettore Ritacco; Nikos Pelekis; Roberto Trasarti; Gianni Costa; Fosca Giannotti; Giuseppe Manco; Chiara Renso; Yannis Theodoridis

In this work we propose DAEDALUS, a formal framework and system, specifically focussed on progressive combination of mining and querying operators. The core component of DAEDALUS is the MO-DMQL query language that extends SQL in two respects, namely a pattern definition operator and the capability to uniform manipulating both raw data and unveiled patterns. DAEDALUS system is specifically focussed on movement data and has been implemented as a query execution layer on top of the Hermes Moving Object Database. The expressiveness and usefulness of the MODMQL language as well as the computational capabilities of DAEDALUS are qualitatively evaluated by means of a case study.

Knowledge and Information Systems | 2008

Boosting text segmentation via progressive classification

Eugenio Cesario; Francesco Folino; Antonio Locane; Giuseppe Manco; Riccardo Ortale

A novel approach for reconciling tuples stored as free text into an existing attribute schema is proposed. The basic idea is to subject the available text to progressive classification, i.e., a multi-stage classification scheme where, at each intermediate stage, a classifier is learnt that analyzes the textual fragments not reconciled at the end of the previous steps. Classification is accomplished by an ad hoc exploitation of traditional association mining algorithms, and is supported by a data transformation scheme which takes advantage of domain-specific dictionaries/ontologies. A key feature is the capability of progressively enriching the available ontology with the results of the previous stages of classification, thus significantly improving the overall classification accuracy. An extensive experimental evaluation shows the effectiveness of our approach.

ACM Transactions on Information Systems | 2013

X-Class: Associative Classification of XML Documents by Structure

Gianni Costa; Riccardo Ortale; Ettore Ritacco

The supervised classification of XML documents by structure involves learning predictive models in which certain structural regularities discriminate the individual document classes. Hitherto, research has focused on the adoption of prespecified substructures. This is detrimental for classification effectiveness, since the a priori chosen substructures may not accord with the structural properties of the XML documents. Therein, an unexplored question is how to choose the type of structural regularity that best adapts to the structures of the available XML documents. We tackle this problem through X-Class, an approach that handles all types of tree-like substructures and allows for choosing the most discriminatory one. Algorithms are designed to learn compact rule-based classifiers in which the chosen substructures discriminate the classes of XML documents. X-Class is studied across various domains and types of substructures. Its classification performance is compared against several rule-based and SVM-based competitors. Empirical evidence reveals that the classifiers induced by X-Class are compact, scalable, and at least as effective as the established competitors. In particular, certain substructures allow the induction of very compact classifiers that generally outperform the rule-based competitors in terms of effectiveness over all chosen corpora of XML data. Furthermore, such classifiers are substantially as effective as the SVM-based competitor, with the additional advantage of a high-degree of interpretability.

international conference on tools with artificial intelligence | 2011

Effective XML Classification Using Content and Structural Information via Rule Learning

Gianni Costa; Riccardo Ortale; Ettore Ritacco

We propose a new approach to XML classification, that uses a particular rule-learning technique for the induction of interpretable classification models. These separate the individual classes of XML documents by looking at the presence within the XML documents themselves of certain features, that provide information on their content and structure. The devised approach induces classifiers with outperforming effectiveness in comparison to several established competitors.

conference on recommender systems | 2011

Modeling item selection and relevance for accurate recommendations: a bayesian approach

Nicola Barbieri; Gianni Costa; Giuseppe Manco; Riccardo Ortale

We propose a bayesian probabilistic model for explicit preference data. The model introduces a generative process, which takes into account both item selection and rating emission to gather into communities those users who experience the same items and tend to adopt the same rating pattern. Each user is modeled as a random mixture of topics, where each topic is characterized by a distribution modeling the popularity of items within the respective user-community and by a distribution over preference values for those items. The proposed model can be associated with a novel item-relevance ranking criterion, which is based both on item popularity and users preferences. We show that the proposed model, equipped with the new ranking criterion, outperforms state-of-art approaches in terms of accuracy of the recommendation list provided to users on standard benchmark datasets.

acm symposium on applied computing | 2003

Similarity-based clustering of Web transactions

Giuseppe Manco; Riccardo Ortale; Domenico Saccà

We introduce a measure to compute similarity between two sequences containing accesses to Web pages, to be exploited in a clustering approach for grouping sessions of accesses to a Web site. The notion of sequence similarity is parametric to the sequence topology, and the similarity among Web pages within the sequences. In our formalization, two Web pages are similar if they can be considered synonymies not only from a content point of view, but also from a usage point of view, i.e., if users exhibit the same behavior on both pages. The refined notion of page similarity, as well as the related notion of sequence siilarity, are envisaged to be effective in the application of a centroid-based clustering technique to the personalization of Web experience.

Explore More