Giuseppe Desoli | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Giuseppe Desoli is active.

Explore More

Publication

Featured researches published by Giuseppe Desoli.

IEEE Transactions on Computers | 2012

Variability-Aware Task Allocation for Energy-Efficient Quality of Service Provisioning in Embedded Streaming Multimedia Applications

Francesco Paterna; Andrea Acquaviva; Alberto Caprara; Francesco Papariello; Giuseppe Desoli; Luca Benini

Multimedia streaming applications running on next-generation parallel multiprocessor arrays in sub-45 nm technology face new challenges related to device and process variability, leading to performance and power variations across the cores. In this context, Quality of Service (QoS), as well as energy efficiency, could be severely impacted by variability. In this work, we propose a runtime variability-aware workload distribution technique for enhancing real-time predictability and energy efficiency based on an innovative Linear-Programming + Bin-Packing formulation which can be solved in linear time. We demonstrate our approach on the virtual prototype of a next-generation industrial multicore platform running representative multimedia applications. Experimental results confirm that our technique compensates variability, while improving energy-efficiency and minimizing deadline violations in presence of performance and power variations across the cores. The proposed policy can save up to 33 percent of energy with respect to the state-of-the-art policies and 65 percent of energy with respect to one variability-unaware task allocation policy while providing better QoS.

design, automation, and test in europe | 2009

Adaptive idleness distribution for non-uniform aging tolerance in multiprocessor systems-on-chip

Francesco Paterna; Luca Benini; Andrea Acquaviva; Francesco Papariello; Giuseppe Desoli; Mauro Olivieri

In deep submicron designs of MultiProcessor Systems-on-Chip (MPSoC) architectures, uncompensated within-die process variations and aging effects will lead to an increasing uncertainty and unbalancing of expected core lifetimes. In this paper we present an adaptive workload allocation strategy for run-time compensation of variations- and aging-induced unbalanced core lifetimes by means of core activity duty cycling. The proposed techniques regulates the percentage of idle time on short-expected-life cores to meet the platform lifetime target with minimum performance degradation. Experiments have been conducted on a multiprocessor simulator of a next-generation industrial MPSoC platform for multimedia applications made of a general purpose processor and programmable accelerators.

computing frontiers | 2010

Variability-tolerant run-time workload allocation for MPSoC energy minimization under real-time constraints

Francesco Paterna; Andrea Acquaviva; Alberto Caprara; Francesco Papariello; Giuseppe Desoli; Luca Benini

Multicore architectures will be adopted in the sub-50nm CMOS technology nodes for virtually all application domains with energy efficiency requirements exceeding 10GOPS/Watt. Unfortunately, future technology nodes will be increasingly affected by variation phenomena, and multicore architectures will be impacted in many ways by the variability of the underlying silicon fabrics [1, 6, 8]. Our architectural target is an advanced prototype of an industrial multicore platform for post-2014 set-top-box products, featuring a single CPU coordinator and an array of programmable VLIW hardware accelerators with multi-threading support. Next-generation set-top-boxes will support very high resolution, high-frame rate video rendering with complex 3D GUIs and stereoscopic visualization support [2]. These applications require extensive image processing and enhancements functions which are embarrassingly parallel and will be distributed on the VLIW accelerator array as a large number of barrier-synchronized tasks. Accelerators are nominally homogeneous, but unfortunately variability causes significant perturbations on their performance and power consumption. We define a two-phase approach based on linear programming and bin packing. Thanks to these steps, the technique performs task allocation exploiting the awareness of performance and power variations of the cores, thus minimizing deadline misses and improving energy efficiency of the platform with respect to a variation-blind approach. In this work we consider variability effects acting independently on critical path delay, leakage power, and dynamic power [3]. Variability distribution data have been obtained through the VAM tool

design, automation, and test in europe | 2011

An efficient on-line task allocation algorithm for QoS and energy efficiency in multicore multimedia platforms

Francesco Paterna; Andrea Acquaviva; Alberto Caprara; Francesco Papariello; Giuseppe Desoli; Luca Benini

The impact of variability on sub-45nm CMOS multimedia platforms makes hard to provide application QoS guarantees, as the speed variations across the cores may cause sub-optimal and sample-dependent utilization of the available resources and energy budget. These effects can be compensated by an efficient allocation of the workload at run-time. In the context of multimedia applications, a critical objective is to compensate core speed variability while matching time constraints without impacting the energy consumption. In this paper we present a new approach to compute optimal task allocations at run-time. The proposed strategy exploits an efficient and scalable implementation to find on-line the best possible solution in a tightly bounded time. Experimental results demonstrate the effectiveness of compensation both in terms of deadline miss rate and energy savings. Results have been compared with those obtained applying state-of-art techniques on a multithreaded MPEG2 decoder. The validation has been performed on a cycle-accurate virtual prototype of a next-generation industrial multicore platform that has been extended with process variability models.

international solid-state circuits conference | 2017

14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems

Giuseppe Desoli; Nitin Chawla; Thomas Boesch; Surinder-pal Singh; Elio Guidetti; Fabio De Ambroggi; Tommaso Majo; Paolo Zambotti; Manuj Ayodhyawasi; Harvinder Singh; Nalin Aggarwal

A booming number of computer vision, speech recognition, and signal processing applications, are increasingly benefiting from the use of deep convolutional neural networks (DCNN) stemming from the seminal work of Y. LeCun et al. [1] and others that led to winning the 2012 ImageNet Large Scale Visual Recognition Challenge with AlexNet [2], a DCNN significantly outperforming classical approaches for the first time. In order to deploy these technologies in mobile and wearable devices, hardware acceleration plays a critical role for real-time operation with very limited power consumption and with embedded memory overcoming the limitations of fully programmable solutions.

IEEE Transactions on Very Large Scale Integration Systems | 2007

Computing and design for software and silicon manufacturing

Davide Pandini; Giuseppe Desoli; Alessandro Cremonesi

An increasing demand for higher performance, for lower power density, and for greatly expanded functionalities will determine radical changes in the future computing architectures. These widely acknowledged emerging trends are however insufficient to address all the challenges introduced by advanced silicon nanometer technologies. It is well known that manufacturability for high yield, along with design productivity and predictability and system reconfigurability for reduced NRE costs and faster time-to-market, are major problems in gigascale SoC design. Therefore, only focusing the design efforts on performance, power consumption, and throughput can hinder the potentials of the new computing architectures and limit the silicon yield. In this paper, we introduce an innovative architecture-to-silicon platform that by exploiting the concept of regularity at different levels of abstraction addresses the emerging challenges for the new computing architectures, and links system and architecture definition with silicon fabrication.

ACM Transactions in Embedded Computing Systems | 2012

Variability-tolerant workload allocation for MPSoC energy minimization under real-time constraints

Francesco Paterna; Andrea Acquaviva; Francesco Papariello; Giuseppe Desoli; Luca Benini

Sub-50nm CMOS technologies are affected by significant variability which causes power and performance variations among nominally similar cores in MPSoC platforms. This undesired heterogeneity threatens execution predictability and energy efficiency. We propose two techniques to allocate sets of barrier-synchronized tasks (representative of a wide class of image processing workloads) onto variability-affected MPSoCs. The first technique models allocation as an ILP and achieves optimal results, but requires an off-line solver. The second techniques adopt a two-stage heuristic approach, and it can be adapted to work on-line. We tested our approach on the virtual prototype of a next-generation industrial multi-core platform. Experimental results demonstrate that our approach minimizes deadline violations while increasing energy efficiency.

advanced concepts for intelligent vision systems | 2016

The Orlando Project: A 28 nm FD-SOI Low Memory Embedded Neural Network ASIC

Giuseppe Desoli; Valeria Tomaselli; Emanuele Plebani; Giulio Urlini; Danilo Pau; Viviana D’Alto; Tommaso Majo; Fabio De Ambroggi; Thomas Boesch; Surinder-pal Singh; Elio Guidetti; Nitin Chawla

The recent success of neural networks in various computer vision tasks open the possibility to add visual intelligence to mobile and wearable devices; however, the stringent power requirements are unsuitable for networks run on embedded CPUs or GPUs. To address such challenges, STMicroelectronics developed the Orlando Project, a new and low power architecture for convolutional neural network acceleration suited for wearable devices. An important contribution to the energy usage is the storage and access to the neural network parameters. In this paper, we show that with adequate model compression schemes based on weight quantization and pruning, a whole AlexNet network can fit in the local memory of an embedded processor, thus avoiding additional system complexity and energy usage, with no or low impact on the accuracy of the network. Moreover, the compression methods work well across different tasks, e.g. image classification and object detection.

Archive | 2010

Data stream flow controller and computing system architecture comprising such a flow controller

Giuseppe Desoli; Jean-Philippe Cousin; Gilles Pelissier; Badr Bentaybi

Archive | 2011

ERA – Embedded Reconfigurable Architectures

Stephan Wong; Luigi Carro; Mateus B. Rutzig; Debora Matos; Roberto Giorgi; Nikola Puzovic; Stefanos Kaxiras; Marcelo Cintra; Giuseppe Desoli; Paolo Gai; Sally A. McKee; Ayal Zaks

Explore More