Geppino Pucci | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Geppino Pucci is active.

Explore More

Publication

Featured researches published by Geppino Pucci.

acm symposium on parallel algorithms and architectures | 1996

BSP vs LogP

Gianfranco Bilardi; Kieran T. Herley; Andrea Pietracaprina; Geppino Pucci; Paul G. Spirakis

A quantitative comparison of the BSP and LogP models for parallel computation is developed. Very efficient cross simulations between the two models are derived, showing their substantial equivalence for algorithmic design guided by asymptotic analysis. It is also shown that the two models can be implemented with similar performance on most point-to-point networks. In conclusion, within the limits of our analysis that is mainly of asymptotic nature, BSP and LogP can be viewed as closely related variants within the bandwidth-latency framework for modeling parallel computation. BSP seems somewhat preferable due to greater simplicity and portability, and slightly greater power.

european symposium on algorithms | 1998

Algorit[h]ms - ESA '98 : 6th Annual European Symposium, Venice, Italy, August 24-26, 1998 : proceedings

Gianfranco Bilardi; Giuseppe F. Italiano; Andrea Pietracaprina; Geppino Pucci

Data sets in large applications are often too massive to fit completely inside the computer’s internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this tutorial, we survey the state of the art in the design and analysis of external memory algorithms (also known as EM algorithms or out-of-core algorithms or I/O algorithms). External memory algorithms are often designed using the parallel disk model (PDM). The three machine-independent measures of an algorithm’s performance in PDM are the number of I/O operations performed, the CPU time, and the amount of disk space used. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle cache hierarchies, hierarchical memory, and tertiary storage. We discuss a variety of problems and show how to solve them efficiently in external memory. Programming tools and environments are available for simplifying the programming task. Experiments on some newly developed algorithms for spatial databases incorporating these paradigms, implemented using TPIE (Transparent Parallel I/O programming Environment), show significant speedups over popular methods.

parallel computing | 2011

Universality in VLSI Computation.

Gianfranco Bilardi; Geppino Pucci

Containing over 300 entries in an A-Z format, the Encyclopedia of Parallel Computing provides easy, intuitive access to relevant information for professionals and researchersseeking access to any aspect within the broad field of parallel computing. Topics for this comprehensive reference were selected, written, and peer-reviewed by an international pool of distinguished researchers in the field. The Encyclopedia is broad in scope, covering machine organization, programming languages, algorithms, and applications. Within each area, concepts, designs, and specific implementations are presented. The highly-structured essays in this work comprise synonyms, a definition and discussion of the topic, bibliographies, and links to related literature. Extensive cross-references to other entries within the Encyclopedia support efficient, user-friendly searchers for immediate access to useful information. Key concepts presented in the Encyclopedia of Parallel Computing include; laws and metrics; specific numerical and non-numerical algorithms; asynchronous algorithms; libraries of subroutines; benchmark suites; applications; sequential consistency and cache coherency; machine classes such as clusters, shared-memory multiprocessors, special-purpose machines and dataflow machines; specific machines such as Cray supercomputers, IBMs cell processor and Intels multicore machines; race detection and auto parallelization; parallel programming languages, synchronization primitives, collective operations, message passing libraries, checkpointing, and operating systems. Topics covered: Speedup, Efficiency, Isoefficiency, Redundancy, Amdahls law, Computer Architecture Concepts, Parallel Machine Designs, Benmarks, Parallel Programming concepts & design, Algorithms, Parallel applications. This authoritative reference will be published in two formats: print and online. The online edition features hyperlinks to cross-references and to additional significant research. Related Subjects: supercomputing, high-performance computing, distributed computing

principles of distributed computing | 2011

Tight bounds on information dissemination in sparse mobile networks

Alberto Pettarin; Andrea Pietracaprina; Geppino Pucci; Eli Upfal

Motivated by the growing interest in mobile systems, we study the dynamics of information dissemination between agents moving independently on a plane. Formally, we consider k mobile agents performing independent random walks on an n-node grid. At time 0, each agent is located at a random node of the grid and one agent has a rumor. The spread of the rumor is governed by a dynamic communication graph process {Gt(r)|t ≥ 0}, where two agents are connected by an edge in Gt(r) iff their distance at time t is within their transmission radius r. Modeling the physical reality that the speed of radio transmission is much faster than the motion of the agents, we assume that the rumor can travel throughout a connected component of Gt before the graph is altered by the motion. We study the broadcast time TB of the system, which is the time it takes for all agents to know the rumor. We focus on the sparse case (below the percolation point rc ≈ √n/k) where, with high probability, no connected component in Gt has more than a logarithmic number of agents and the broadcast time is dominated by the time it takes for many independent random walks to meet one other. Quite surprisingly, we show that for a system below the percolation point, the broadcast time does not depend on the transmission radius. In fact, we prove that TB = Θ(n/√k) for any 0 ≤ r < rc, even when the transmission range is significantly larger than the mobility range in one step, giving a tight characterization up to logarithmic factors. Our result complements a recent result of Peres et al. (SODA 2011) who showed that above the percolation point the broadcast time is polylogarithmic in k.

international conference on supercomputing | 2012

Space-round tradeoffs for MapReduce computations

Andrea Pietracaprina; Geppino Pucci; Matteo Riondato; Francesco Silvestri; Eli Upfal

This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for MapReduce which captures the functional flavor of the paradigm by allowing for a flexible use of parallelism. Indeed, the model diverges from a traditional processor-centric view by featuring parameters which embody only global and local memory constraints, thus favoring a more data-centric view. Second, we apply the model to the fundamental computation task of matrix multiplication presenting upper and lower bounds for both dense and sparse matrix multiplication, which highlight interesting tradeoffs between space and round complexity. Finally, building on the matrix multiplication results, we derive further space-round tradeoffs on matrix inversion and matching.

Algorithmica | 1990

A new scheme for the deterministic simulation of PRAMs in VLSI

Fabrizio Luccio; Andrea Pietracaprina; Geppino Pucci

A deterministic scheme for the simulation of (n, m)-PRAM computation is devised. Each PRAM step is simulated on a bounded degree network consisting of a mesh-of-trees (MT) of siden. The memory is subdivided inn modules, each local to a PRAM processor. The roots of the MT contain these processors and the memory modules, while the otherO(n2) nodes have the mere capabilities of packet switchers and one-bit comparators. The simulation algorithm makes a crucial use of pipelining on the MT, and attains a time complexity ofO(log2n/log logn). The best previous time bound wasO(log2n) on a different interconnection network withn processors. While the previous simulation schemes use an intermediate MPC model, which is in turn simulated on a bounded degree network, our method performs the simulation directly with a simple algorithm.

IEEE Transactions on Computers | 1993

Scattering and gathering messages in networks of processors

Sandeep N. Bhatt; Geppino Pucci; Abhiram G. Ranade; Arnold L. Rosenberg

The operations of scattering and gathering in a network of processors involve one processor of the network (P/sub 0/) communicating with all other processors. In scattering, P/sub 0/ sends distinct messages to P/sub 0/. The authors consider networks that are trees of processors. Algorithms for scattering messages from and gathering messages to the processor that resides at the root of the tree are presented. The algorithms are quite general, in that the messages transmitted can differ arbitrarily in length; quite strong, in that they send messages along noncolliding paths, and hence do not require any buffering or queueing mechanisms in the processors; and quite efficient in that algorithms for scattering in general trees are optimal, the algorithm for gathering in a path is optimal and the algorithms for gathering in general trees are nearly optimal. The algorithms can easily be converted using spanning trees to efficient algorithms for scattering and gathering in networks of arbitrary topologies. >

Information Processing Letters | 1991

Parallel priority queues

Maria Cristina Pinotti; Geppino Pucci

This paper introduces the Parallel Priority Queue (PPQ) abstract data type. A PPQ stores a set of integer-valued items andprovides operations such as insertion of n new items or deletion of the n smallest ones. Algorithms for realizing PPQ operations on an n-processor CREW-PRAM are based on two new data structures, the n-Bandwidth-Heap (n.-H) and the n-Bandwidth-Leftist-Heap (n-L), that are obtained as extensions of the well-known sequential binary-heap and leftist-heap, respectively. Using these structures, it is shown that insertion of n new items in a PPQ of m elements can be performed in parallel time O(h +log n), where h = log(m/n), while deletion of the n smallest items can be performed in time O(h+ log log n).

ieee international conference on high performance computing, data, and analytics | 2005

The potential of on-chip multiprocessing for QCD machines

Gianfranco Bilardi; Andrea Pietracaprina; Geppino Pucci; F. Schifano; R. Tripiccione

We explore the opportunities offered by current and forthcoming VLSI technologies to on-chip multiprocessing for Quantum Chromo Dynamics (QCD), a computational grand challenge for which over half a dozen specialized machines have been developed over the last two decades. Based on a careful study of the information exchange requirements of QCD both across the network and within the memory system, we derive the optimal partition of die area between storage and functional units. We show that a scalable chip organization holds the promise to deliver from hundreds to thousands flop per cycle as VLSI feature size scales down from 90 nm to 20 nm, over the next dozen years.

international conference on computational science | 2001

On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation

Gianfranco Bilardi; Carlo Fantozzi; Andrea Pietracaprina; Geppino Pucci

This paper surveys and places into perspective a number of results concerning the D-BSP (Decomposable Bulk Synchronous Parallel) model of computation, a variant of the popular BSP model proposed byValiant in the early nineties. D-BSP captures part of the proximity structure of the computing platform, modeling it by suitable decompositions into clusters, each characterized by its own bandwidth and latency parameters. Quantitative evidence is provided that, when modeling realistic parallel architectures, D-BSP achieves higher effectiveness and portability than BSP, without significantly affecting the ease of use. It is also shown that D-BSP avoids some of the shortcomings of BSP which motivated the definition of other variants of the model. Finally, the paper discusses how the aspects of network proximity incorporated in the model allow for a better management of network congestion and bank contention, when supporting a shared-memory abstraction in a distributed-memory environment.

Explore More