Daniele Bortolotti | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniele Bortolotti is active.

Explore More

Publication

Featured researches published by Daniele Bortolotti.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

VirtualSoC: A Full-System Simulation Environment for Massively Parallel Heterogeneous System-on-Chip

Daniele Bortolotti; Christian Pinto; Andrea Marongiu; Martino Ruggiero; Luca Benini

Driven by flexibility, performance and cost constraints of demanding modern applications, heterogeneous System-on-Chip (SoC) is the dominant design paradigm in the embedded system computing domain. SoC architecture and heterogeneity clearly provide a wider power/performance scaling, combining high performance and power efficient general-purpose cores along with massively parallel many-core-based accelerators. Besides the complex hardware, generally these kinds of platforms host also an advanced software ecosystem, composed by an operating system, several communication protocol stacks, and various computational demanding user applications. The necessity to efficiently cope with the hugeHW/SW design space provided by this scenario makes clearly full-system simulator one of the most important design tools. We present in this paper a new emulation framework, called Virtual SoC, targeting the full-system simulation of massively parallel heterogeneous SoCs.

international symposium on low power electronics and design | 2014

Approximate compressed sensing: ultra-low power biosignal processing via aggressive voltage scaling on a hybrid memory multi-core processor

Daniele Bortolotti; Hossein Mamaghanian; Andrea Bartolini; Maryam Ashouei; Jan Stuijt; David Atienza; Pierre Vandergheynst; Luca Benini

Technology scaling enables the design of low cost biosignal processing chips suited for emerging wireless body-area sensing applications. Energy consumption severely limits such applications and memories are becoming the energy bottleneck to achieve ultra-low-power operation. When aggressive voltage scaling is used, memory operation becomes unreliable due to the lack of sufficient Static Noise Margin. This paper introduces an approximate biosignal Compressed Sensing approach. We propose a digital architecture featuring a hybrid memory (6T-SRAM/SCMEM cells) designed to control perturbations on specific data structures. Combined with a statistically robust reconstruction algorithm, the system tolerates memory errors and achieves significant energy savings with low area overhead.

design, automation, and test in europe | 2014

Hybrid memory architecture for voltage scaling in ultra-low power multi-core biomedical processors

Daniele Bortolotti; Andrea Bartolini; Christian Weis; Davide Rossi; Luca Beninio

Technology scaling enables today the design of sensor-based ultra-low cost chips well suited for emerging applications such as wireless body sensor networks, urban life and environment monitoring. Energy consumption is the key limiting factor of this up-coming revolution and memories are often the energy bottleneck mainly due to leakage power. This paper proposes an ultra-low power multi-core architecture targeting eHealth monitoring systems, where applications involve collection of sequences of slow biomedical signals and highly parallel computations at very low voltage. We propose a hybrid memory architecture that combines 6T-SRAM and 8T-SRAM operating in the same voltage domain and capable of dispatching at high voltage a normal operation and at low voltage a fully reliable small memory partition (8T) while the rest of the memory (6T) is state-retentive. Our architecture offers significant energy savings with a low area overhead in typical eHealth Compressed Sensing-based applications.

international symposium on system-on-chip | 2011

Exploring instruction caching strategies for tightly-coupled shared-memory clusters

Daniele Bortolotti; Francesco Paterna; Christian Pinto; Andrea Marongiu; Martino Ruggiero; Luca Benini

Several Chip-Multiprocessor designs today leverage tightly-coupled computing clusters as a building block. These clusters consist of a fairly large number N of simple cores, featuring fast communication through a shared multibanked L1 data memory and ≈ 1 Instruction-Per-Cycle (IPC) per core. Thus, aggregated I-fetch bandwidth approaches ƒ * N, where ƒ is the cluster clock frequency. An effective instruction cache architecture is key to support this I-fetch bandwidth. In this paper we compare two main architectures for instruction caching targeting tightly coupled CMP clusters: (i) private instruction caches per core and (ii) shared instruction cache per cluster. We developed a cycle-accurate model of the tightly coupled cluster with several configurable architectural parameters for exploration, plus a programming environment targeted at efficient data-parallel computing. We conduct an in-depth study of the two architectural templates based on the use of both synthetic microbenchmarks and real program workloads. Our results provide useful insights and guidelines for designers.

design, automation, and test in europe | 2015

An ultra-low power dual-mode ECG monitor for healthcare and wellness

Daniele Bortolotti; Mauro Mangia; Andrea Bartolini; Riccardo Rovatti; Gianluca Setti; Luca Benini

Technology scaling enables today the design of ultra-low cost wireless body sensor networks for wearable biomedical monitors. These devices, according to the application domain, show greatly varying tradeoffs in terms of energy consumption, resources utilization and reconstructed biosignal quality. To achieve minimal energy operation and extend battery life, several aspects must be considered, ranging from signal processing to the technological layers of the architecture. The recently proposed Rakeness-based Compressed Sensing (CS) expands the standard CS paradigm deploying the localization of input signal energy to further increase data compression without sensible RSNR degradation. This improvement can be used either to optimize the usage of a non volatile memory (NVM) to store in the device a record of the biosignal or to minimize the energy consumption for the transmission of the entire signal as well as some of its features. We specialize the sensing stage to achieve signal qualities suitable for both Healthcare (HC) and Wellness (WN), according to an external input (e.g. the patient). In this paper we envision a dual-operation wearable ECG monitor, considering a multi-core DSP for input biosignal compression and different technologies for either transmission or local storage. The experimental results show the effectiveness of the Rakeness approach (up to ≈ 70% more energy efficient than the baseline) and evaluate the energy gains considering different use case scenarios.

conference on design and architectures for signal and image processing | 2014

Rakeness-based compressed sensing on ultra-low power multi-core biomedicai processors

Daniele Bortolotti; Mauro Mangia; Andrea Bartolini; Riccardo Rovatti; Gianluca Setti; Luca Benini

Technology scaling enables today the design of ultra-low cost wireless body sensor networks for wearable biomedical monitors. The typical behaviour of such systems consists of multi-channel input biosignals acquisition data compression and final output transmission or storage. To achieve minimal energy operation and extend battery life several aspects must be considered ranging from signal processing to architectural optimizations. The recently proposed Rakeness-based Compressed Sensing (CS) paradigm deploys the localization of input signal energy to further increase compression without sensible RSNR degradation. Such output size reduction allows for trading off energy from the compression stage to the transmission or storage stage. In this paper we analyze such tradeoffs considering a multi-core DSP for input biosignal computation and different technologies for either transmission or local storage. The experimental results show the effectiveness of the Rakeness approach (on average ≈ 44% more efficient than the baseline) and assess the energy gains in a technological perspective.

design, automation, and test in europe | 2012

Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs

José L. Abellán; Juan C. Fernandez; Manuel E. Acacio; Davide Bertozzi; Daniele Bortolotti; Andrea Marongiu; Luca Benini

Barrier synchronization is a key programming primitive for shared memory embedded MPSoCs. As the core count increases, software implementations cannot provide the needed performance and scalability, thus making hardware acceleration critical. In this paper we describe an interconnect extension implemented with standard cells and with a mainstream industrial toolflow. We show that the area overhead is marginal with respect to the performance improvements of the resulting hardware-accelerated barriers. We integrate our HW barrier into the OpenMP programming model and discuss synchronization efficiency compared with traditional software implementations.

Microprocessors and Microsystems | 2017

Zeroing for HW-efficient compressed sensing architectures targeting data compression in wireless sensor networks

Mauro Mangia; Daniele Bortolotti; Fabio Pareschi; Andrea Bartolini; Luca Benini; Riccardo Rovatti; Gianluca Setti

Abstract The design of ultra-low cost wireless body sensor networks for wearable biomedical monitors has been made possible by today technology scaling. In these systems, a typically multi-channel biosignal sensor takes care of the operations of acquisition, data compression and final output transmission or storage. Furthermore, since these sensors are usually battery powered, the achievement of minimal energy operation is a fundamental issue. To this aim, several aspects must be considered, ranging from signal processing to architectural optimization. In this paper we consider the recently proposed rakeness-based compressed sensing (CS) paradigm along with its zeroing companion. With respect to a standard CS base sensor, the first approach allows us to further increase compression rate without sensible signal quality degradation by exploiting localization of input signal energy. The latter paradigm is here formalized and applied to further reduce the energy consumption of the sensing node. The application of both rakeness and zeroing allows for trading off energy from the compression stage to the transmission or storage one. Different cases are taken into account, by considering a realistic model of an ultra-low-power multicore DSP system.

international conference on high performance computing and simulation | 2016

User-space APIs for dynamic power management in many-core ARMv8 computing nodes

Daniele Bortolotti; Simone Tinti; Piero Altoe; Andrea Bartolini

The push for energy-efficient and energy-proportional computing nodes, together with the increasing number of cores integrated in the same silicon die has lead to computing nodes with fine grained power management capabilities. To unleash the potential of this HW design a novel user-space power management APIs is needed to bring fine-grain power management in the hands of the programmer. In this work we present a novel programming mechanism for energy efficiency which is build around novel user-space power management APIs suitable to be embedded in user-space applications. We evaluated its timing and power saving performance on a novel computing node based on Cavium ThunderX ARMv8-based many-cores SoC.

international symposium on system on chip | 2015

Long-Term ECG monitoring with zeroing Compressed Sensing approach

Mauro Mangia; Daniele Bortolotti; Andrea Bartolini; Fabio Pareschi; Luca Benini; Riccardo Rovatti; Gianluca Setti

Novel low-voltage, low latency, non-volatile memory (NVM) technologies allow long-term wearable biomedical monitors to benefit from large storage capability, avoiding costly wireless transmissions and enabling, along with proper signal processing and architectural optimization, minimal energy operations and extended battery life. The recently proposed rakeness-based Compressed Sensing (RCS) offers high compression rate with an associated low computational power. This allows an energy trade-off between the compression stage and the storage stage. In this paper we introduce a novel approach, namely zeroing CS, which reduces RCS computational requirements to extremely low levels. The new energy trade-off is analyzed, considering a suitable multi-core DSP and different NVM technologies for local storage. According to our analysis, the proposing zeroing approach is up to 80% more efficient than a standard CS solution and 70% w.r.t. RCS when overall energy requirement is not dominated by storage.

Explore More