Is this you? Create Your Porfile

Eugenio Cesario

Indian Council of Agricultural Research

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eugenio Cesario is active.

Explore More

Publication

Featured researches published by Eugenio Cesario.

IEEE Transactions on Knowledge and Data Engineering | 2007

Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data

Eugenio Cesario; Giuseppe Manco; Riccardo Ortale

A parameter-free, fully-automatic approach to clustering high-dimensional categorical data is proposed. The technique is based on a two-phase iterative procedure, which attempts to improve the overall quality of the whole partition. In the first phase, cluster assignments are given, and a new cluster is added to the partition by identifying and splitting a low-quality cluster. In the second phase, the number of clusters is fixed, and an attempt to optimize cluster assignments is done. On the basis of such features, the algorithm attempts to improve the overall quality of the whole partition and finds clusters in the data, whose number is naturally established on the basis of the inherent features of the underlying data set rather than being previously specified. Furthermore, the approach is parametric to the notion of cluster quality: Here, a cluster is defined as a set of tuples exhibiting a sort of homogeneity. We show how a suitable notion of cluster homogeneity can be defined in the context of high-dimensional categorical data, from which an effective instance of the proposed clustering scheme immediately follows. Experiments on both synthetic and real data prove that the devised algorithm scales linearly and achieves nearly optimal results in terms of compactness and separation.

Information Sciences | 2008

Random walk biclustering for microarray data

Fabrizio Angiulli; Eugenio Cesario; Clara Pizzuti

A biclustering algorithm, based on a greedy technique and enriched with a local search strategy to escape poor local minima, is proposed. The algorithm starts with an initial random solution and searches for a locally optimal solution by successive transformations that improve a gain function. The gain function combines the mean squared residue, the row variance, and the size of the bicluster. Different strategies to escape local minima are introduced and compared. Experimental results on several microarray data sets show that the method is able to find significant biclusters, also from a biological point of view.

Knowledge and Information Systems | 2008

Boosting text segmentation via progressive classification

Eugenio Cesario; Francesco Folino; Antonio Locane; Giuseppe Manco; Riccardo Ortale

A novel approach for reconciling tuples stored as free text into an existing attribute schema is proposed. The basic idea is to subject the available text to progressive classification, i.e., a multi-stage classification scheme where, at each intermediate stage, a classifier is learnt that analyzes the textual fragments not reconciled at the end of the previous steps. Classification is accomplished by an ad hoc exploitation of traditional association mining algorithms, and is supported by a data transformation scheme which takes advantage of domain-specific dictionaries/ontologies. A key feature is the capability of progressively enriching the available ontology with the results of the previous stages of classification, thus significantly improving the overall classification accuracy. An extensive experimental evaluation shows the effectiveness of our approach.

international conference on cloud and green computing | 2013

Towards a Cloud-Based Framework for Urban Computing, The Trajectory Analysis Case

Eugenio Cesario; Carmela Comito; Domenico Talia

In this paper we present a Cloud-based framework for urban computing that can be tailored to be used in different scenarios of urban planning and management that can occur in smart cities. The focus in the paper is on the management of large-scale socio-geographic data obtained through the trajectories followed by mobile devices. Our goal is to mine human activities and routines from this socio-geographic data in order to catch users behaviour. To this aim, we introduce a methodology for trajectory pattern mining consisting in (a) finding frequent regions, more densely passed through ones, and (b) extracting trajectory patterns from those regions. Experimental evaluation shows that due to complexity and large data involved in the application scenario, the trajectory pattern mining process can take advantage from a parallel execution environment offered by a Cloud architecture.

international conference on data mining | 2008

Distributed Data Mining Models as Services on the Grid

Eugenio Cesario; Domenico Talia

This paper describes how distributed data mining models, such as collective learning, ensemble learning, and meta-learning models, can be implemented as WSRF mining services by exploiting the Grid infrastructure. Our goal is to design a general distributed architectural model that can be exploited for different distributed mining algorithms deployed as Grid services for the analysis of dispersed data sources. In order to validate our approach, we present also the implementation of two clustering algorithms on such architecture, and evaluate their performance.

Concurrency and Computation: Practice and Experience | 2013

Programming knowledge discovery workflows in service‐oriented distributed systems

Eugenio Cesario; Marco Lackovic; Domenico Talia; Paolo Trunfio

In several scientific and business domains, very large data repositories are generated. To find interesting and useful information in those repositories, efficient data mining techniques and knowledge discovery processes must be used. The exploitation of data mining techniques in science helps scientists in hypothesis formation and gives them a support on their scientific practices, whereas in industrial processes, data mining can exploit existing data sources as a real value for companies that can take advantage from the knowledge that can be extracted from their large data sources. Data mining tasks are often composed by multiple stages that may be linked to each other to form various execution flows. Moreover, data mining tasks are often distributed because they involve data and tools located over geographically distributed environments. Therefore, it is fundamental to exploit effective paradigms, such as services and workflows, to model data mining tasks that are both multi‐staged and distributed. This paper discusses data mining services and workflows for analyzing scientific data in high‐performance distributed environments such as Grids and Clouds. We discuss how it is possible to define basic and complex services for supporting distributed data mining tasks in Grids. We also present a workflow formalism and a service‐oriented programming framework, named DIS3GNO, for designing and running distributed knowledge discovery processes in the Knowledge Grid system. DIS3GNO supports all the phases of a knowledge discovery process, including composition, execution, and results visualization. After introducing DIS3GNO, some relevant use cases implemented by it and a performance evaluation of the system are discussed.Copyright

Concurrency and Computation: Practice and Experience | 2012

Distributed data mining patterns and services: an architecture and experiments

Eugenio Cesario; Domenico Talia

Distributed data mining implements techniques for analyzing data on distributed computing systems by exploiting data distribution and parallel algorithms. The grid is a computing infrastructure for implementing distributed high‐performance applications and solving complex problems, offering effective support to the implementation and use of data mining and knowledge discovery systems. The Web Services Resource Framework has become the standard for the implementation of grid services and applications, and it can be exploited for developing high‐level services for distributed data mining applications. This paper describes how distributed data mining patterns, such as collective learning, ensemble learning, and meta‐learning models, can be implemented as Web Services Resource Framework mining services by exploiting the grid infrastructure. The goal of this work was to design a distributed architectural model that can be exploited for different distributed mining patterns deployed as grid services for the analysis of dispersed data sources. In order to validate such an approach, we presented also the implementation of two clustering algorithms on the developed architecture. In particular, the distributed k‐means and distributed expectation maximization were exploited as pilot examples to show the suitability of the implemented service‐oriented framework. An extensive evaluation of its performance was provided. Copyright

international conference on spatial data mining and geographical knowledge services | 2015

Following soccer fans from geotagged tweets at FIFA World Cup 2014

Eugenio Cesario; Chiara Congedo; Fabrizio Marozzo; Gianni Riotta; Alessandra Spada; Domenico Talia; Paolo Trunfio; Carlo Turri

The world-wide size of social networks, such as Facebook and Twitter, is making possible to analyse the realtime behaviour of large groups of people, such those attending popular events. This paper presents work and results on the analysis of geotagged tweets carried out to understand the behaviour of people attending the 2014 FIFA World Cup. We monitored the Twitter users attending the World Cup matches to discover the most frequent movements of fans during the competition. The data source is represented by all geotagged tweets collected during the 64 matches of the World Cup from June 12 to July 13, 2014. For each match we considered only the geotagged tweets whose coordinates fallen within the area of stadiums, during the matches. Then, we carried out a trajectory pattern mining analysis on the set of the tweets considered. Original results were obtained in terms of number of matches attended by groups of fans, clusters of most attended matches, and most frequented stadiums.

european conference on parallel processing | 2010

Distributed Data Mining using a Public Resource Computing Framework

Eugenio Cesario; Nicola De Caria; Carlo Mastroianni; Domenico Talia

The public resource computing paradigm is often used as a successful and low cost mechanism for the management of several classes of scientific and commercial applications that require the execution of a large number of independent tasks. Public computing frameworks, also known as “Desktop Grids”, exploit the computational power and storage facilities of private computers, or “workers”. Despite the inherent decentralized nature of the applications for which they are devoted, these systems often adopt a centralized mechanism for the assignment of jobs and distribution of input data, as is the case for BOINC, the most popular framework in this realm. We present a decentralized framework that aims at increasing the flexibility and robustness of public computing applications, thanks to two basic features: (i) the adoption of a P2P protocol for dynamically matching the job specifications with the worker characteristics, without relying on centralized resources; (ii) the use of distributed cache servers for an efficient dissemination and reutilization of data files. This framework is exploitable for a wide set of applications. In this work, we describe how a Java prototype of the framework was used to tackle the problem of mining frequent itemsets from a transactional dataset, and show some preliminary yet interesting performance results that prove the efficiency improvements that can derive from the presented architecture.

international conference on high performance computing and simulation | 2016

Analyzing social media data to discover mobility patterns at EXPO 2015: Methodology and results

Eugenio Cesario; Andrea Raffaele Iannazzo; Fabrizio Marozzo; Fabrizio Morello; Gianni Riotta; Alessandra Spada; Domenico Talia; Paolo Trunfio

Social media posts are often tagged with geographical coordinates or other information that allows identifying user positions, this way enabling mobility pattern analysis using trajectory mining techniques. This paper presents a methodology and discusses results of a study aimed at discovering behavior and mobility patterns of Instagram users who visited EXPO 2015, the Universal Exposition hosted in Milan, Italy, from May to October 2015. We collected and analyzed geotagged posts published by about 238,000 Instagram users who visited EXPO 2015, including more than 570,000 posts published during the visits, and 2.63 million posts published by them from one month before to one month after their visit to EXPO. To cope with this large amount of data, the whole process - from data collection to data mining - was implemented on a high-performance cloud platform that provided the necessary storage and compute resources. The analysis allowed us to discover how the number of visitors changed over time, which were the sets of most frequently visited pavilions, which countries the visitors came from, and the main flows of destination of visitors towards Italian cities and regions in the days after their visit to EXPO. A strong correlation (Pearson coefficient 0.7) was measured between official visitor numbers and the visit trends produced by our analysis, which assessed the effectiveness of the proposed methodology and confirmed the reliability of results.

Explore More