Giuseppe M. Mazzeo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Giuseppe M. Mazzeo is active.

Explore More

Publication

Featured researches published by Giuseppe M. Mazzeo.

Lecture Notes in Computer Science | 2004

A Grid Framework for Approximate Aggregate Query Answering on Summarized Sensor Network Readings

Alfredo Cuzzocrea; Filippo Furfaro; Giuseppe M. Mazzeo; Domenico Saccà

The problem of representing and querying sensor-network data issues new research challenges, as traditional techniques and architectures used for managing relational and object oriented databases are not suitable in this context. In this paper we present a Grid-based architecture that supports aggregate query answering on sensor network data, and uses a summarization technique to efficiently accomplish this task. In particular, grid nodes are used either to collect, compress and store sensor readings, and to extract information from stored data. Grid nodes can exchange information among each other, so that the same piece of information can be stored (with a different degree of accuracy) into several nodes. Queries are evaluated by locating the grid nodes containing the needed information, and choosing (among these nodes) the most convenient ones, according to a cost model.

pervasive computing and communications | 2005

A distributed system for answering range queries on sensor network data

Alfredo Cuzzocrea; Filippo Furfaro; Sergio Greco; Elio Masciari; Giuseppe M. Mazzeo; Domenico Saccà

A distributed system for approximate query answering on sensor network data is proposed, where a suitable compression technique is exploited to represent data and support query answering. Each node of the system stores either detailed or summarized sensor readings. Query answers are computed by identifying the set of nodes that contain (either compressed or not) data involved in the query, and eventually partitioning the query in a set of sub-queries to be evaluated at different nodes. Queries are partitioned according to a cost model aiming at making the evaluation efficient and guaranteeing the desired degree of accuracy of query answers.

acm symposium on applied computing | 2005

Hierarchical binary histograms for summarizing multi-dimensional data

Filippo Furfaro; Giuseppe M. Mazzeo; Domenico Saccà; Cristina Sirangelo

The need to compress data into synopses of summarized information often arises in many application scenarios, where the aim is to retrieve aggregate data efficiently, possibly trading off the computational efficiency with the accuracy of the estimation. A widely used approach for summarizing multi-dimensional data is the histogram-based representation scheme, which consists in partitioning the data domain into a number of blocks (called buckets), and then storing summary information for each block. In this paper, a new histogram-based summarization technique which is very effective for multi-dimensional data is proposed. This technique exploits a multi-resolution organization of summary data, on which an efficient physical representation model is defined. The adoption of this representation model (based on a hierarchical organization of the buckets) enables some storage space to be saved w.r.t. traditional histograms, which can be invested to obtain finer grain blocks, thus approximating data with more detail. Experimental results show that our technique yields higher accuracy in retrieving aggregate information from the histogram w.r.t. traditional approaches (classical multi-dimensional histograms as well as other types of summarization technique).

extending database technology | 2006

Exploiting cluster analysis for constructing multi-dimensional histograms on both static and evolving data

Filippo Furfaro; Giuseppe M. Mazzeo; Cristina Sirangelo

Density-based clusterization techniques are investigated as a basis for constructing histograms in multi-dimensional scenarios, where traditional techniques fail in providing effective data synopses. The main idea is that locating dense and sparse regions can be exploited to partition the data into homogeneous buckets, preventing dense and sparse regions from being summarized into the same aggregate data. The use of clustering techniques to support the histogram construction is investigated in the context of either static and dynamic data, where the use of incremental clustering strategies is mandatory due to the inefficiency of performing the clusterization task from scratch at each data update.

IEEE Transactions on Knowledge and Data Engineering | 2010

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Filippo Furfaro; Giuseppe M. Mazzeo; Andrea Pugliese

A P2P-based framework supporting the extraction of aggregates from historical multidimensional data is proposed, which provides efficient and robust query evaluation. When a data population is published, data are summarized in a synopsis, consisting of an index built on top of a set of subsynopses (storing compressed representations of distinct data portions). The index and the subsynopses are distributed across the network, and suitable replication mechanisms taking into account the query workload and network conditions are employed that provide the appropriate coverage for both the index and the subsynopses.

Information Sciences | 2014

Analysing microarray expression data through effective clustering

Elio Masciari; Giuseppe M. Mazzeo; Carlo Zaniolo

The recent advances in genomic technologies and the availability of large-scale microarray datasets call for the development of advanced data analysis techniques, such as data mining and statistical analysis to cite a few. Among the mining techniques proposed so far, cluster analysis has become a standard method for the analysis of microarray expression data. It can be used both for initial screening of patients and for extraction of disease molecular signatures. Moreover, clustering can be profitably exploited to characterize genes of unknown function and uncover patterns that can be interpreted as indications of the status of cellular processes. Finally, clustering biological data would be useful not only for exploring the data but also for discovering implicit links between the objects. To this end, several clustering approaches have been proposed in order to obtain a good trade-off between accuracy and efficiency of the clustering process. In particular, great attention has been devoted to hierarchical clustering algorithms for their accuracy in unsupervised identification and stratification of groups of similar genes or patients, while, partition based approaches are exploited when fast computations are required. Indeed, it is well known that no existing clustering algorithm completely satisfies both accuracy and efficiency requirements, thus a good clustering algorithm has to be evaluated with respect to some external criteria that are independent from the metric being used to compute clusters. In this paper, we propose a clustering algorithm called M-CLUBS (for Microarray data CLustering Using Binary Splitting) exhibiting higher accuracy than the hierarchical ones proposed so far while allowing a faster computation with respect to partition based approaches. Indeed, M-CLUBS is faster and more accurate than other algorithms, including k-means and its recently proposed refinements, as we will show in the experimental section. The algorithm consists of a divisive phase and an agglomerative phase; during these two phases, the samples are repartitioned using a least quadratic distance criterion possessing unique analytical properties that we exploit to achieve a very fast computation. M-CLUBS derives good clusters without requiring input from users, and it is robust and impervious to noise, while providing better speed and accuracy than methods, such as BIRCH, that are endowed with the same critical properties. Due to the structural feature of microarray data (they are represented as arrays of numeric values), M-CLUBS is suitable for analyzing them since it is designed to perform well for Euclidean distances. In order to stronger the obtained results we interpreted the obtained clusters by a domain expert and the evaluation by quality measures specifically tailored for biological validity assessment.

Information Systems | 2011

A quad-tree based multiresolution approach for two-dimensional summary data

Francesco Buccafurri; Filippo Furfaro; Giuseppe M. Mazzeo; Domenico Saccí

Evaluating aggregate range queries by accessing a compressed representation of the data is a widely adopted solution to the problem of efficiently retrieving aggregate information from large amounts of data. Although several summarization techniques have been proposed which are effective in reducing the amount of time needed for computing aggregates, querying summary data often results in dramatically inaccurate estimates, due to the difficulty of limiting the loss of information resulting from data compression. Thus, a crucial issue regarding the definition of summarization techniques is to retain a reasonable degree of approximation in reconstructing query answers. Following the idea that an effective ad-hoc solution to this problem can be found in specific application domains, in this paper we restrict our attention to the case of two-dimensional data, which is relevant for a number of applications. Our proposal is a summarization technique where blocks of data resulting from a quad-tree based partition of the two-dimensional domain are summarized into aggregate values and possibly associated with indices, i.e., compact structures providing an approximate description the original data inside them. Several experimental results are presented showing that our technique results in data synopses providing query estimates having error rates lower than other techniques tailored at data with a generic dimensionality, such as wavelets and various types of multi-dimensional histogram.

pacific-asia conference on knowledge discovery and data mining | 2013

A New, Fast and Accurate Algorithm for Hierarchical Clustering on Euclidean Distances

Elio Masciari; Giuseppe M. Mazzeo; Carlo Zaniolo

A simple hierarchical clustering algorithm called CLUBS (for CLustering Using Binary Splitting) is proposed. CLUBS is faster and more accurate than existing algorithms, including k-means and its recently proposed refinements. The algorithm consists of a divisive phase and an agglomerative phase; during these two phases, the samples are repartitioned using a least quadratic distance criterion possessing unique analytical properties that we exploit to achieve a very fast computation. CLUBS derives good clusters without requiring input from users, and it is robust and impervious to noise, while providing better speed and accuracy than methods, such as BIRCH, that are endowed with the same critical properties.

Information Sciences | 2017

A fast and accurate algorithm for unsupervised clustering around centroids

Giuseppe M. Mazzeo; Elio Masciari; Carlo Zaniolo

A centroid-based clustering algorithm is proposed that works in a totally unsupervised fashion and is significantly faster and more accurate than existing algorithms. The algorithm, named CLUBS+ (for CLustering Using Binary Splitting), achieves these results by combining features of hierarchical and partition-based algorithms. Thus, CLUBS+ consists of two major phases, i.e., a divisive phase and an agglomerative phase, each followed by a refinement phase. Each major phase consists of successive steps in which the samples are repartitioned using a criterion based on least quadratic distance. This criterion possesses unique analytical properties that are elucidated in the paper and exploited by the algorithm to achieve a very fast computation. The paper presents the results of the extensive experiments performed: these confirm that the new algorithm is fast, impervious to noise, and produces results of better quality than other algorithms, such as BOOL, BIRCH, and k-means++, even when the analyst can determine the correct number of clustersa very difficult task from which users are spared by CLUBS+.

data warehousing and knowledge discovery | 2005

Clustering-based histograms for multi-dimensional data

Filippo Furfaro; Giuseppe M. Mazzeo; Cristina Sirangelo

A new technique for constructing multi-dimensional histograms is proposed. This technique first invokes a density-based clustering algorithm to locate dense and sparse regions of the input data. Then the data distribution inside each of these regions is summarized by partitioning it into non-overlapping blocks laid onto a grid. The granularity of this grid is chosen depending on the underlying data distribution: the more homogeneous the data, the coarser the grid. Our approach is compared with state-of-the-art histograms on both synthetic and real-life data and is shown to be more effective.

Explore More