Diego Melpignano
STMicroelectronics
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Diego Melpignano.
design automation conference | 2012
Diego Melpignano; Luca Benini; Eric Flamand; Bruno Jego; Thierry Lepley; Germain Haugou; Fabien Clermidy; Denis Dutoit
P2012 is an area- and power-efficient many-core computing accelerator based on multiple globally asynchronous, locally synchronous processor clusters. Each cluster features up to 16 processors with independent instruction streams sharing a multi-banked one-cycle access L1 data memory, a multi-channel DMA engine and specialized hardware for synchronization and aggressive power management. P2012 is 3D stacking ready and can be customized to achieve extreme area and energy efficiency by adding domain-specific HW IPs to the cluster. The first P2012 SoC prototype in 28nm CMOS will sample in Q3, featuring four 16-processor clusters, a 1MB L2 memory and delivering 80GOPS (with 32 bit single precision floating point support) in 18mm2 with 2W power consumption (worst-case). P2012 can run standard OpenCL™ and proprietary Native Programming Model SW components to achieve the highest level of control on application-to-resource mapping. A dedicated version of the OpenCV vision library is provided in the P2012 SW Development Kit to enable visual analytics acceleration. This paper will discuss preliminary performance measurements of common feature extraction and tracking algorithms, parallelized on P2012, versus sequential execution on ARM CPUs.
design, automation, and test in europe | 2012
Luca Benini; Eric Flamand; Didier Fuin; Diego Melpignano
P2012 is an area- and power-efficient many-core computing fabric based on multiple globally asynchronous, locally synchronous (GALS) clusters supporting aggressive fine-grained power, reliability and variability management. Clusters feature up to 16 processors and one control processor with independent instruction streams sharing a multi-banked L1 data memory, a multi-channel DMA engine, and specialized hardware for synchronization and scheduling. P2012 achieves extreme area and energy efficiency by supporting domain-specific acceleration at the processor and cluster level through the addition of dedicated HW IPs. P2012 can run standard OpenCL and OpenMP parallel codes well as proprietary Native Programming Model (NPM) SW components that provide the highest level of control on application-to-resource mapping. In Q3 2011 the P2012 SW Development Kit (SDK) has been made available to a community of R&D users; it includes full OpenCL and NPM development environments. The first P2012 SoC prototype in 28nm CMOS will sample in Q4 2012, featuring four clusters and delivering 80GOPS (with single precision floating point support) in 15.2mm2 with 2W power consumption.
Computer Communications | 2008
Mahesh Sooriyabandara; Tim Farnham; Costas Efthymiou; Matthias Wellens; Janne Riihijärvi; Petri Mähönen; Alain Gefflaut; José Antonio Galache; Diego Melpignano; Arthur van Rooijen
We present the Unified Link Layer API (ULLA) framework: an open and extensible API framework that incorporates a number of requirements related to a wide range of applications, including multi-mode and cross-layer optimisation scenarios. This work has been mainly motivated by the complexity and interoperability problems related to the large number of wireless APIs available today. ULLA provides database and object oriented service abstractions to applications through a generic query mechanism, a method to setup asynchronous notifications and a command interface. It encapsulates link level heterogeneity by defining a unified model for link technologies. We describe design details, various implementation options and discuss how the proposed ULLA design provides an extensible, scalable and platform independent framework, enabling seamless link access and control in various types of device platforms. Application programming using ULLA is illustrated using code examples. Numerous usage scenarios for ULLA are presented, highlighting unified access to heterogeneous link standards while encouraging application innovation.
international conference on communications | 2008
Nicola Baldo; Federico Maguolo; Simone Merlin; Andrea Zanella; Michele Zorzi; Diego Melpignano; David Siorpaes
Rate adaptation for 802.11 has been deeply investigated in the past, but the problem of achieving optimal rate adaptation with respect not only to channel-related errors but also to contention-related issues (i.e., collisions and variations in medium access times) is still unsolved. In this paper we address this issue by proposing (1) a practical definition of the medium status in a multi-user 802.11 scenario in terms of channel errors, MAC collisions and packet service times, and a method for its estimation based on measurements; (2) an analytical model of the goodput performance as a function of the Medium Status; (3) a rate adaptation algorithm, called goodput optimal rate adaptation (GORA), which is based on this model. Unlike other rate adaptation schemes proposed in literature, which require either modifications to the IEEE 802.11 standard or cooperation among nodes, GORA is totally stand-alone and standard compliant. In fact, the Medium Status Estimation used by GORA is obtained by using standard MAC counters that are commonly collected by commercial MAC drivers, and no explicit interactions with the other devices in the network is required. Therefore, GORA offers the advantage of being readily deployable on real devices. The performance of GORA is evaluated through NS2 simulations which reveal that, as expected, GORA outperforms other well- known rate adaptation algorithms in several scenarios and can be used as a new reference benchmark.
design, automation, and test in europe | 2013
Edoardo Paone; N. Vahabi; Vittorio Zaccaria; Cristina Silvano; Diego Melpignano; Germain Haugou; Thierry Lepley
In this paper, we introduce a novel modeling technique to reduce the time associated with cycle-accurate simulation of parallel applications deployed on many-core embedded platforms. We introduce an ensemble model based on artificial neural networks that exploits (in the training phase) multiple levels of simulation abstraction, from cycle-accurate to cycle-approximate, to predict the cycle-accurate results for unknown application configurations. We show that high-level modeling can be used to significantly reduce the number of low-level model evaluations provided that a suitable artificial neural network is used to aggregate the results. We propose a methodology for the design and optimization of such an ensemble model and we assess the proposed approach for an industrial simulation framework based on STMicroelectronics STHORM (P2012) many-core computing fabric.
international conference on hardware/software codesign and system synthesis | 2012
Edoardo Paone; Gianluca Palermo; Vittorio Zaccaria; Cristina Silvano; Diego Melpignano; Germain Haugou; Thierry Lepley
Open Computing Language (OpenCL) is emerging as a standard for parallel programming of heterogeneous hardware accelerators. With respect to device specific languages, OpenCL enables application portability but does not guarantee performance portability, eventually requiring additional tuning of the implementation to a specific platform or to unpredictable dynamic workloads. In this paper, we present a methodology to analyze the customization space of an OpenCL application in order to improve performance portability and to support dynamic adaptation. We formulate our case study by implementing an OpenCL image stereo-matching application (which computes the relative depth of objects from a pair of stereo images) customized to the STMicroelectronics Platform 2012 many-core computing fabric. In particular, we use design space exploration techniques to generate a set of operating points that represent specific configurations of the parameters allowing different trade-offs between performance and accuracy of the algorithm itself. These points give detailed knowledge about the interaction between the application parameters, the underlying architecture and the performance of the system; they could also be used by a run-time manager software layer to meet dynamic Quality-of-Service (QoS) constraints. To analyze the customization space, we use cycle-accurate simulations for the target architecture. Since the profiling phase of each configuration takes a long simulation time, we designed our methodology to reduce the overall number of simulations by exploiting some important features of the application parameters; our analysis also enables the identification of the parameters that could be explored on a high-level simulation model to reduce the simulation time. The resulting methodology is one order of magnitude more efficient than an exhaustive exploration and, given its randomized nature, it increases the probability to avoid sub-optimal trade-offs.
formal methods | 2011
Christian Fabre; Iuliana Bacivarov; Ananda Basu; Martino Ruggiero; David Atienza; Eric Flamand; Jean-Pierre Krimm; Julien Mottin; Lars Schor; Pratyush Kumar; Hoeseok Yang; Devesh B. Chokshi; Lothar Thiele; Saddek Bensalem; Marius Bozga; Luca Benini; Mohamed M. Sabry; Yusuf Leblebici; Giovanni De Micheli; Diego Melpignano
PRO3D tackles two important 3D technologies, that are Through Silicon Via (TSV) and liquid cooling, and investigates their consequences on stacked architectures and entire software development. In particular, memory hierarchies are being revisited and the thermal impact of software on the 3D stack is explored. As a key result, a software design flow based on the rigorous assembly of software components and monitoring of the thermal integrity of the 3D stack has been developed. After 30 months of research, PRO3D proposes a complete tool-chain for 3D manycore, that integrates state-of-the-art tools ranging from system-level formal specification and 3D exploration, to actual programming and runtime control on the 3D system. Current efforts are directed towards extensive experiments on an industrial embedded manycore platform.
Archive | 2001
Diego Melpignano; Andrea Zanella
TCP is the current dominant transport protocol, mainly used in fixed networks. It is well known that TCP performance may degrade over paths that include wireless links, where packet losses are often not related to congestion, but to the unreliability of the transmission medium. In this paper, we examine this problem considering a wireless link based on Bluetooth radio equipment. Bluetooth (BT) is a low-cost system in the unlicensed 2.4GHz band. It provides a reliable data transmission using fast frequency hopping technique and Stop-and- Wait ARQ scheme. In our experiments, we have studied the performance of a heavy file transfer over a BT link, with different environmental conditions and BT radio packet formats. Results show that the best FTP performance in a wide range of radio channel conditions is obtained by using long non-FEC-protected radio packets. Nevertheless, in particularly hostile situations, the intermediate-length packet format appears more suitable. Furthermore, analysis has focused the possibility of inefficiency due to bad interaction between TCP and BT retransmission mechanisms.
international conference on communications | 2008
Nicola Baldo; Federico Maguolo; Simone Merlin; Andrea Zanella; Michele Zorzi; Diego Melpignano; David Siorpaes
In this paper we present APOS, a method for dynamically adapting the parameters of IEEE 802.11 g to the estimated system state, with the aim of enhancing the quality of a voice communication between a mobile station and a remote peer node. The system state is estimated based on a number of counters that are collected by the MAC layer of the mobile station, regarding the number of successful and unsuccessful transmission/reception events, channel busy periods and idle slots. These statistics are processed to estimate the collision probability and the signal to noise ratio at the receiver side. Hence, a mathematical model is used to get the expected end-to-end network performance in terms of throughput, delay and packet error rate, for different settings of some PHY and MAC parameters, such as the modulation/coding scheme and the retransmission limit. The setting that is estimated to maximize the quality of service for the end user is then selected. Unlike other optimization mechanisms proposed in literature, APOS is totally stand-alone and standard compliant. In fact, APOS makes use of local information that can be collected from the Network Interface Card, and no explicit interactions with the other devices in the network is required.
personal, indoor and mobile radio communications | 2002
Petri Mähönen; Luis Muñoz; Diego Melpignano; George Orphanos; Zach Shelby; Timo Saarinen; Marcelo H. García
Heterogeneous networks require mechanisms for transparent protocol boosting in legacy systems and adaptive support for seamless interoperability between different wireless access methods. In this paper, we present an open performance enhancement protocol architecture that is developed to provide a generic protocol enhancing proxy (PEP) service for wireless access points and terminals. First, we describe the basic architecture and philosophy of our approach. Then some illustrative protocol boosting modules are presented with results. Finally, we describe the future work that is required in order to make the so-called wireless adaptation layer (WAL) useful in production quality environments.