Emilio Mancini | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Emilio Mancini is active.

Explore More

Publication

Featured researches published by Emilio Mancini.

international conference on parallel processing | 2010

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters

Krishna Chaitanya Kandalla; Emilio Mancini; Sayantan Sur; Dhabaleswar K. Panda

Modern supercomputing systems have witnessed a phenomenal growth in the recent history owing to the advent of multi-core architectures and high speed networks. However, the operational and maintenance costs of these systems have also grown rapidly. Several concepts such as Dynamic Voltage and Frequency Scaling (DVFS) and CPU Throttling have been proposed to conserve the power consumed by the compute nodes during idle periods. However, it is necessary to design software stacks in a power-aware manner to minimize the amount of power drawn by the system during the execution of applications. It is also critical to minimize the performance overheads associated with power-aware algorithms, as the benefits of saving power could be lost if the application runs for a longer time. Modern multi-core architectures such as the Intel “Nehalem” allow for DVFS and CPU throttling operations to be performed with little overheads. In this paper, we explore how these features can be leveraged to design algorithms to deliver fine-grained power savings during the communication phases of parallel applications. We also propose a theoretical model to analyze the power consumption characteristics of communication operations. We use microbenchmarks and application benchmarks such as NAS and CPMD to measure the performance of our proposed algorithms and to demonstrate the potential for saving power with 32 and 64 processes. We observe about 8% improvement in the overall energy consumed by these applications with little performance overheads.

international conference on parallel and distributed systems | 2005

A Simulation-Based Framework for Autonomic Web Services

Emilio Mancini; Umberto Villano; Massimiliano Rak; Roberto Torella

A possible solution to guarantee critical requirements in Web services designs is the use of an autonomic architecture, able to auto-configure and to auto-tune. This paper presents an innovative approach for the development of self-optimizing autonomic systems for Web services architectures, based on the adoption of a simulation engine for obtaining performance predictions. MAWeS (MetaPL/HeSSE Autonomic Web Services) is a framework whose aim is to support the development of self-optimizing predictive autonomic systems for Web service architectures. It adopts a simulation-based methodology, which allows to predict system performances in different status and load conditions. The predicted results are used for a feedforward control of the system, which self-tunes before the new conditions and the subsequent performance losses are actually observed

parallel, distributed and network-based processing | 2005

Performance-driven development of a Web services application using MetaPL/HeSSE

Emilio Mancini; Umberto Villano; Nicola Mazzocca; Massimiliano Rak; Roberto Torella

One of the leading programming techniques for the development of distributed applications is the use of Web services (WS). The strong point of WS technology is the definition of an operating environment that can be adopted to execute applications, independently of the original development and deployment platforms. However, whenever performance is an issue, the message overhead resulting from the XML messaging approach and from the software layers introduced to obtain system abstraction should be carefully considered. This paper presents a simulation-based methodology that makes it possible to predict Web services-based application performance, even if the execution environment of choice is not available and the application is not completely developed. This methodology can be used as the basis for performance-driven Web services development. The proposed methodology is applied to the development of a simple but realistic Web service application.

Concurrency and Computation: Practice and Experience | 2007

Cluster systems and simulation: from benchmarking to off‐line performance prediction

Beniamino Di Martino; Emilio Mancini; Massimiliano Rak; Roberto Torella; Umberto Villano

This paper describes a simulation‐based technique for the performance prediction of message‐passing applications on cluster systems by means of benchmark data. Given data measuring the performance of a target cluster in the form of standard benchmark results, along with the details of the chosen computing configuration, it is possible to build and to validate automatically a detailed simulation model. This makes it possible to predict off‐line, i.e. without resorting to the real hardware, the performance of fully developed or even of skeletal code. An XML‐based language (MetaPL) is adopted to describe the application behavior in the development stage. After a description of the approach and the illustration of the construction and validation of the simulation model, the paper presents a case study. Copyright

grid computing | 2010

An MPI-Stream Hybrid Programming Model for Computational Clusters

Emilio Mancini; Gregory Marsh; Dhabaleswar K. Panda

The MPI programming model hides network type and topology from developers, but also allows them to seamlessly distribute a computational job across multiple cores in both an intra and inter node fashion. This provides for high locality performance when the cores are either on the same node or on nodes closely connected by the same network type. The streaming model splits a computational job into a linear chain of decoupled units. This decoupling allows the placement of job units on optimal nodes according to network topology. Furthermore, the links between these units can be of varying protocols when the application is distributed across a heterogeneous network. In this paper we study how to integrate the MPI and Stream programming models in order to exploit network locality and topology. We present a hybrid MPI-Stream framework that aims to take advantage of each models strengths. We test our framework with a financial application. This application simulates an electronic market for a single financial instrument. A stream of buy and sell orders is fed into a price matching engine. The matching engine creates a stream of order confirmations, trade confirmations, and quotes based on its attempts to match buyers with sellers. Our results show that the hybrid MPI-Stream framework can deliver a 32% performance improvement at certain order transmission rates.

Lecture Notes in Computer Science | 2003

Off-line performance prediction of message-passing applications on cluster systems

Emilio Mancini; Massimiliano Rak; Roberto Torella; Umberto Villano

This paper describes a simulation-based technique for the performance prediction of message- passing applications on cluster systems. Given data measuring the performance of a target cluster in the form of standard benchmark results, along with the details of the chosen computing configuration (e.g., the number of nodes), it is possible to build and to validate automatically a detailed simulation model. This makes it possible to predict the performance of fully-developed or skeletal code off-line, i.e., without resorting to the real hardware. The reasonable accuracy obtained makes this approach particularly useful for preliminary performance testing of parallel code on non-available hardware. After a description of the approach and of the construction and validation of the simulation model, the paper presents a case study.

instrumentation and measurement technology conference | 2002

Modelling and characterization of pipelined ADCs

Dominique Dallet; Pasquale Daponte; Emilio Mancini; Sergio Rapuano

The paper deals with the problems of modeling and testing of pipeline ADCs. In particular, such problems are faced through the realization of a modular virtual instrument. It has been developed in Java language in order to be remotely manageable through a common Internet browser. The instrument features include: (i) a module able to model an ADC through the specialization of a simplified behavioral model, also sketched in the paper; (ii) a module executing the dynamic testing of the device in frequency domain; (iii) a scalable database providing data sharing among more remote users; and (iv) some interface modules for programmable instrumentation. The paper also presents the results of the first validation phase of the instrument, carried out on two pipeline ADCs.

Future Generation Computer Systems | 2008

A grid-aware MIP solver: Implementation and case studies

Emilio Mancini; Sonya Marcarelli; Igor Vasilyev; Umberto Villano

This paper presents a grid-enabled system for solving large-scale mixed integer programming (MIP) problems. The system has been developed using Globus and MPICH-G2, and consists of two solvers and an interface portal. After a brief introduction to Branch, Cut and Price optimization algorithms, the paper focuses on the system architecture, solvers and portal user interface. The performance of the system is measured and analysed on a small-scale grid environment consisting of three clusters on a campus LAN.

advanced information networking and applications | 2006

Autonomic Web service development with MAWeS

Emilio Mancini; Umberto Villano; Massimiliano Rak

Service oriented architectures (SOA) are based on applications consisting of an aggregation of services with standard interface, offered on distributed hosts. The highly distributed nature and the load sensitivity of these architectures make it very difficult to guarantee performance requirements under rapidly-changing load conditions. This paper deals with the development of service oriented predictive autonomic systems that are capable to optimize themselves using a feedforward approach, by exploiting automatically generated performance predictions. The MAWeS (MetaPL/HeSSE autonomic Web services) framework allows the development of self-tuning applications that proactively optimize themselves by simulating the execution environment. An example of application development in MAWeS is thoroughly dealt with, showing the implementation of a system that exploits MAWeS services to choose dynamically among several different algorithms to meet response time constraints.

international conference on parallel processing | 2010

High Performance Design and Implementation of Nemesis Communication Layer for Two-Sided and One-Sided MPI Semantics in MVAPICH2

Miao Luo; Sreeram Potluri; Ping Lai; Emilio Mancini; Hari Subramoni; Krishna Chaitanya Kandalla; Sayantan Sur; Dhabaleswar K. Panda

High End Computing (HEC) systems are being deployed with eight to sixteen compute cores, with 64 to 128 cores/node being envisioned for exascale systems. \mbox{MVAPICH2} is a popular implementation of MPI-2 specifically designed and optimized for InfiniBand, iWARP and RDMA over Converged Ethernet (RoCE). MVAPICH2 is based on MPICH2 from ANL. Recently MPICH2 has been redesigned with an effort to optimize intra-node communication for future many-core systems. The new communication layer in MPICH2 is called Nemesis, which is very well optimized for shared memory message passing, with a modular design for various high-performance interconnects. In this paper we explore the challenges involved in designing the next-generation MVAPICH2 stack, leveraging the Nemesis communication layer. We observe that Nemesis does not provide abstractions for one-sided communication. We propose an extended Nemesis interface for optimized one-sided communication and provide design details. Our experimental evaluation shows that our proposed one-sided interface extensions are able to provide significantly better performance than the basic Nemesis interface. For example, inter-node MPI_Put bandwidth increased from 1,800 MB/s to 3,000 MB/s and latency for small messages went down by 13%. Additionally, with our proposed designs, we are able to demonstrate performance gains with small messages, when compared to the existing MVAPICH2 CH3 implementation. The designs proposed in this paper is a superset of currently available options to MVAPICH2 users and provides the best combination of performance and modularity.

Explore More