Andrea Pietracaprina | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrea Pietracaprina is active.

Explore More

Publication

Featured researches published by Andrea Pietracaprina.

acm symposium on parallel algorithms and architectures | 1996

BSP vs LogP

Gianfranco Bilardi; Kieran T. Herley; Andrea Pietracaprina; Geppino Pucci; Paul G. Spirakis

A quantitative comparison of the BSP and LogP models for parallel computation is developed. Very efficient cross simulations between the two models are derived, showing their substantial equivalence for algorithmic design guided by asymptotic analysis. It is also shown that the two models can be implemented with similar performance on most point-to-point networks. In conclusion, within the limits of our analysis that is mainly of asymptotic nature, BSP and LogP can be viewed as closely related variants within the bandwidth-latency framework for modeling parallel computation. BSP seems somewhat preferable due to greater simplicity and portability, and slightly greater power.

european symposium on algorithms | 1998

Algorit[h]ms - ESA '98 : 6th Annual European Symposium, Venice, Italy, August 24-26, 1998 : proceedings

Gianfranco Bilardi; Giuseppe F. Italiano; Andrea Pietracaprina; Geppino Pucci

Data sets in large applications are often too massive to fit completely inside the computer’s internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this tutorial, we survey the state of the art in the design and analysis of external memory algorithms (also known as EM algorithms or out-of-core algorithms or I/O algorithms). External memory algorithms are often designed using the parallel disk model (PDM). The three machine-independent measures of an algorithm’s performance in PDM are the number of I/O operations performed, the CPU time, and the amount of disk space used. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle cache hierarchies, hierarchical memory, and tertiary storage. We discuss a variety of problems and show how to solve them efficiently in external memory. Programming tools and environments are available for simplifying the programming task. Experiments on some newly developed algorithms for spatial databases incorporating these paradigms, implemented using TPIE (Transparent Parallel I/O programming Environment), show significant speedups over popular methods.

principles of distributed computing | 2011

Tight bounds on information dissemination in sparse mobile networks

Alberto Pettarin; Andrea Pietracaprina; Geppino Pucci; Eli Upfal

Motivated by the growing interest in mobile systems, we study the dynamics of information dissemination between agents moving independently on a plane. Formally, we consider k mobile agents performing independent random walks on an n-node grid. At time 0, each agent is located at a random node of the grid and one agent has a rumor. The spread of the rumor is governed by a dynamic communication graph process {Gt(r)|t ≥ 0}, where two agents are connected by an edge in Gt(r) iff their distance at time t is within their transmission radius r. Modeling the physical reality that the speed of radio transmission is much faster than the motion of the agents, we assume that the rumor can travel throughout a connected component of Gt before the graph is altered by the motion. We study the broadcast time TB of the system, which is the time it takes for all agents to know the rumor. We focus on the sparse case (below the percolation point rc ≈ √n/k) where, with high probability, no connected component in Gt has more than a logarithmic number of agents and the broadcast time is dominated by the time it takes for many independent random walks to meet one other. Quite surprisingly, we show that for a system below the percolation point, the broadcast time does not depend on the transmission radius. In fact, we prove that TB = Θ(n/√k) for any 0 ≤ r < rc, even when the transmission range is significantly larger than the mobility range in one step, giving a tight characterization up to logarithmic factors. Our result complements a recent result of Peres et al. (SODA 2011) who showed that above the percolation point the broadcast time is polylogarithmic in k.

international conference on supercomputing | 2012

Space-round tradeoffs for MapReduce computations

Andrea Pietracaprina; Geppino Pucci; Matteo Riondato; Francesco Silvestri; Eli Upfal

This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for MapReduce which captures the functional flavor of the paradigm by allowing for a flexible use of parallelism. Indeed, the model diverges from a traditional processor-centric view by featuring parameters which embody only global and local memory constraints, thus favoring a more data-centric view. Second, we apply the model to the fundamental computation task of matrix multiplication presenting upper and lower bounds for both dense and sparse matrix multiplication, which highlight interesting tradeoffs between space and round complexity. Finally, building on the matrix multiplication results, we derive further space-round tradeoffs on matrix inversion and matching.

Algorithmica | 1990

A new scheme for the deterministic simulation of PRAMs in VLSI

Fabrizio Luccio; Andrea Pietracaprina; Geppino Pucci

A deterministic scheme for the simulation of (n, m)-PRAM computation is devised. Each PRAM step is simulated on a bounded degree network consisting of a mesh-of-trees (MT) of siden. The memory is subdivided inn modules, each local to a PRAM processor. The roots of the MT contain these processors and the memory modules, while the otherO(n2) nodes have the mere capabilities of packet switchers and one-bit comparators. The simulation algorithm makes a crucial use of pipelining on the MT, and attains a time complexity ofO(log2n/log logn). The best previous time bound wasO(log2n) on a different interconnection network withn processors. While the previous simulation schemes use an intermediate MPC model, which is in turn simulated on a bounded degree network, our method performs the simulation directly with a simple algorithm.

Data Mining and Knowledge Discovery | 2010

Mining top-K frequent itemsets through progressive sampling

Andrea Pietracaprina; Matteo Riondato; Eli Upfal; Fabio Vandin

We study the use of sampling for efficiently mining the top-K frequent itemsets of cardinality at most w. To this purpose, we define an approximation to the top-K frequent itemsets to be a family of itemsets which includes (resp., excludes) all very frequent (resp., very infrequent) itemsets, together with an estimate of these itemsets’ frequencies with a bounded error. Our first result is an upper bound on the sample size which guarantees that the top-K frequent itemsets mined from a random sample of that size approximate the actual top-K frequent itemsets, with probability larger than a specified value. We show that the upper bound is asymptotically tight when w is constant. Our main algorithmic contribution is a progressive sampling approach, combined with suitable stopping conditions, which on appropriate inputs is able to extract approximate top-K frequent itemsets from samples whose sizes are smaller than the general upper bound. In order to test the stopping conditions, this approach maintains the frequency of all itemsets encountered, which is practical only for small w. However, we show how this problem can be mitigated by using a variation of Bloom filters. A number of experiments conducted on both synthetic and real benchmark datasets show that using samples substantially smaller than the original dataset (i.e., of size defined by the upper bound or reached through the progressive sampling approach) enable to approximate the actual top-K frequent itemsets with accuracy much higher than what analytically proved.

workshop on graph theoretic concepts in computer science | 2000

On the Space and Access Complexity of Computation DAGs

Gianfranco Bilardi; Andrea Pietracaprina; Paolo D'Alberto

We study the space and the access complexity of computations represented by Computational Directed Acyclic Graphs (CDAGs) in hierarchical memory systems. First, we present a unifying framework for proving lower bounds on the space complexity, which captures most of the bounds known in the literature for relevant CDAGs, previously proved through ad-hoc arguments. Then, we expose a close relationship between the notions of space and access complexity, where the latter represents the minimum number of accesses performed by any computation of a CDAG at a given level of the memory hierarchy. Specifically, we present two general techniques to derive bounds on the access complexity of a CDAG based on the space complexity of certain subgraphs. One technique, simpler to apply, provides only lower bounds, while the other provides (almost) matching lower and upper bounds and improves upon previous well-known result by Hong and Kung.

ieee international conference on high performance computing, data, and analytics | 2005

The potential of on-chip multiprocessing for QCD machines

Gianfranco Bilardi; Andrea Pietracaprina; Geppino Pucci; F. Schifano; R. Tripiccione

We explore the opportunities offered by current and forthcoming VLSI technologies to on-chip multiprocessing for Quantum Chromo Dynamics (QCD), a computational grand challenge for which over half a dozen specialized machines have been developed over the last two decades. Based on a careful study of the information exchange requirements of QCD both across the network and within the memory system, we derive the optimal partition of die area between storage and functional units. We show that a scalable chip organization holds the promise to deliver from hundreds to thousands flop per cycle as VLSI feature size scales down from 90 nm to 20 nm, over the next dozen years.

international conference on computational science | 2001

On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation

Gianfranco Bilardi; Carlo Fantozzi; Andrea Pietracaprina; Geppino Pucci

This paper surveys and places into perspective a number of results concerning the D-BSP (Decomposable Bulk Synchronous Parallel) model of computation, a variant of the popular BSP model proposed byValiant in the early nineties. D-BSP captures part of the proximity structure of the computing platform, modeling it by suitable decompositions into clusters, each characterized by its own bandwidth and latency parameters. Quantitative evidence is provided that, when modeling realistic parallel architectures, D-BSP achieves higher effectiveness and portability than BSP, without significantly affecting the ease of use. It is also shown that D-BSP avoids some of the shortcomings of BSP which motivated the definition of other variants of the model. Finally, the paper discusses how the aspects of network proximity incorporated in the model allow for a better management of network congestion and bank contention, when supporting a shared-memory abstraction in a distributed-memory environment.

symposium on principles of database systems | 2009

An efficient rigorous approach for identifying statistically significant frequent itemsets

Adam Kirsch; Michael Mitzenmacher; Andrea Pietracaprina; Geppino Pucci; Eli Upfal; Fabio Vandin

As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from what would be expected in a random dataset with the same number of transactions and the same individual item frequencies. These itemsets can then be flagged as statistically significant with a small false discovery rate. Our methodology hinges on a Poisson approximation to the distribution of the number of itemsets in a random dataset with support at least s, for any s greater than or equal to a minimum threshold smin. We obtain this result through a novel application of the Chen-Stein approximation method, which is of independent interest. Based on this approximation, we develop an efficient parametric multi-hypothesis test for identifying the desired threshold s*. A crucial feature of our approach is that, unlike most previous work, it takes into account the entire dataset rather than individual discoveries. It is therefore better able to distinguish between significant observations and random fluctuations. We present extensive experimental results to substantiate the effectiveness of our methodology.

Explore More