Davide Bruni
University of Bologna
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Davide Bruni.
IEEE Computer | 2003
Luca Benini; Davide Bertozzi; Davide Bruni; Nicola Drago; Franco Fummi; Massimo Poncino
SystemC is an open source C/C++ simulation environment that provides several class packages for specifying hardware blocks and communication channels. The design environment specifies software algorithmically as a set of functions embedded in abstract modules that communicate with one another and with hardware components via abstract communication channels. It enables transparent integration of instruction-set simulators and prototyping boards. The authors describe a simulation environment that targets heterogeneous multiprocessor systems. They are currently working to extend their methodology to more complex on-chip architectures.
design, automation, and test in europe | 2002
Luca Benini; Davide Bruni; Alberto Macii; Enrico Macii
In this paper, we suggest hardware-assisted data compression as a tool for reducing energy consumption of core-based embedded systems. We propose a novel and efficient architecture for on-the-fly data compression and decompression whose field of operation is the cache-to-memory path. Uncompressed cache lines are compressed before they are written back to main memory, and decompressed when cache refills take place. We explore two classes of compression methods, profile-driven and differential, since they are characterized by compact HW implementations, and we compare their performance to those provided by some state-of-the-art compression methods (e.g., we have considered a few variants of the Lempel-Ziv encoder). We present experimental results about memory traffic and energy consumption in the cache-to-memory path of a core-based system running standard benchmark programs. The achieved average energy savings range from 4.2% to 35.2%, depending on the selected compression algorithm.
IEEE Transactions on Computers | 2003
Luca Benini; Davide Bruni; A. Mach; Enrico Macii; Massimo Poncino
Portable and wearable computers can be powered by different combinations of two or more battery packs to give the user the possibility of choosing an optimal compromise between lifetime and weight/size. Recent work on battery-driven power management has demonstrated that sequential discharge is suboptimal in multibattery systems and lifetime can be maximized by distributing (steering) the current load on the available batteries, thereby discharging them in a partially concurrent fashion. Based on these observations, we formulate multibattery lifetime maximization as a continuous, constrained optimization problem, which can be efficiently solved by nonlinear optimizers. We show that significant lifetime extensions can be obtained with respect to standard sequential discharge (up to 160 percent), as well to previously proposed battery scheduling algorithms (up to 12 percent).
international conference on computer design | 2002
Luca Benini; Davide Bertozzi; Davide Bruni; Nicola Drago; Franco Fummi; Massimo Poncino
We present a co-simulation environment for multiprocessor architectures, that is based on SystemC and allows a transparent integration of instruction set simulators (ISSs) within the SystemC simulation framework. The integration is based on the well-known concept of bus wrapper, that realizes the interface between the ISS and the simulator. The proposed solution uses an ISS-wrapper interface based on the standard gdb remote debugging interface, and implements two alternative schemes that differ in the amount of communication they require. The two approaches provide different degrees of tradeoff between simulation granularity and speed, and show significant speedup with respect to a micro-architectural, full SystemC simulation of the system description.
design automation conference | 2001
Davide Bruni; Alessandro Bogliolo; Luca Benini
The capability of performing semi-automated design space exploration is the main advantage of high-level synthesis with respect to RTL design. However, design space exploration performed during; high-level synthesis is limited in scope, since it provides promising solutions that represent good starting points for subsequent optimizations, but it provides no insight about the overall structure of the design space. In this work we propose unsupervised Monte-Carlo design exploration and statistical characterization to capture the key features of the design space. Our analysis provides insight on how various solutions are distributed over the entire design space. In addition, we apply extreme value theory (1997) to extrapolate achievable bounds from the sampling points.
design, automation, and test in europe | 2003
Pol Marchal; Davide Bruni; José Ignacio Gómez; Luca Benini; L. Pinuel; Francky Catthoor; H. Corporaal
Heterogeneous multi-processors platforms are an interesting option to satisfy the computational performance of dynamic multi-media applications at a reasonable energy cost. Today, almost no support exists to energy-efficiently manage the data of a multi-threaded application on these platforms. In this paper we show that the assignment of data of dynamically created/deleted tasks to the shared memory has a large impact on the energy consumption. We present two dynamic memory allocators which solve the bank assignment problem for shared multi-banked SDRAM memories. Both allocators assign the tasksý data to the available SDRAM banks such that the number of page-misses is reduced. We have measured large energy savings with these allocators compared to existing dynamic memory allocators for several task-sets based on MediaBench[5].
international symposium on circuits and systems | 2002
Luca Benini; Davide Bruni; B. Ricco; Alberto Macii; Enrico Macii
This paper proposes a data compression scheme for minimizing memory traffic in processor-based systems. Data compression and decompression are performed on-the-fly on the cache-to-memory path, that is, uncompressed cache lines are compressed before they are written back to main memory, and decompressed when cache refills take place. The distinguishing feature of the presented solution is its ability of providing high memory traffic reductions without requiring data profiling information. In other words, thanks to the self-learning mechanism it implements, the proposed scheme performs very closely to special-purpose compression approaches, whose main limitation is their inapplicability when off-line data profiling is not feasible. Memory traffic reductions in the cache-to-memory path of a core-based system running standard benchmark programs are, on average, around 34%, and are thus close to those achievable with profile-driven compression.
international symposium on low power electronics and design | 2004
Luca Benini; Davide Bruni; Alberto Macii; Enrico Macii
Storing data in compressed form is becoming common practice in high-performance systems, where memory bandwidth constitutes a serious bottleneck to program execution speed. In this paper, we suggest hardware-assisted data compression as a tool for reducing energy consumption of processor-based systems. We propose a novel and efficient architecture for on-the-fly data compression and decompression whose field of operation is the cache-to-memory path. Uncompressed cache lines are compressed before they are written back to main memory, and decompressed when cache refills take place. We explore two classes of table-based compression schemes. The first, based on offline data profiling, is particularly suitable to embedded systems, where predictability of the data set is usually higher than in general-purpose systems. The second solution we introduce is adaptive, that is, it takes decisions on whether data words should be compressed according to the data statistics of the program being executed. We describe in details the architecture of the compression/decompression unit and we provide an insight about its implementation as a hardware (HW) block. We present experimental results concerning memory traffic and energy consumption in the cache-to-memory path of a core-based system running standard benchmark programs. The obtained energy savings range from 8%-39% when profile-driven compression is adopted, and from 7%-26% when the adaptive scheme is used. Performance improvements are also achieved as a by-product, showing the practical applicability of the proposed approach.
international conference on asic | 2002
Luca Benini; Davide Bruni; N. Drago; F. Fummi; Massimo Poncino
This paper presents a novel HW/SW verification methodology called virtual in-circuit emulation, that is suitable for a platform-based design paradigm, where the main objective of co-verification is to validate the interaction between an existing core processor and some application-specific peripheral system. The proposed co-verification solution shares with conventional emulation schemes the possibility of performing both functional and timing-accurate validation with the same accuracy of the hardware, and greater speed than simulation software, yet it achieves this at a minuscule fraction of the cost of a conventional emulation system. We have validated the virtual in-circuit emulation paradigm on a real board hosting an ARM core and various hardware peripherals running an embedded application, that has been interfaced to a custom-designed I/O unit for the acquisition of data samples, described in SystemC.
Design Automation for Embedded Systems | 2002
Luca Benini; Davide Bruni; Mauro Chinosi; Cristina Silvano; Vittorio Zaccaria; Roberto Zafalon
This paper describes a technique for modeling and estimating the power consumptionat the system-level for embedded VLIW (Very Long Instruction Word) architectures.The method is based on a hierarchy of dynamic power estimationengines: from the instruction-level down to the gate/transistor-level. Powermacro-models have been developed for the main components of the system: theVLIW core, the register file, the instruction and data caches. The main goalis to define a system-level simulation framework for the dynamic profilingof the power behavior during the software execution, providing also a break-downof the power contributions due to the single components of the system. Theproposed approach has been applied to the Lx family of scalable embedded VLIWprocessors, jointly designed by STMicroelectronics and HPLabs. Experimentalresults, carried out over a set of benchmarks for embedded multimedia applications,have demonstrated an average accuracy of 5% of the instruction-level estimationengine with respect to the RTL engine, with an average speed-up of four ordersof magnitude.