Mauro Olivieri
Sapienza University of Rome
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mauro Olivieri.
signal processing systems | 2005
Luca Benini; Davide Bertozzi; Alessandro Bogliolo; Francesco Menichelli; Mauro Olivieri
Technology is making the integration of a large number of processors on the same silicon die technically feasible. These multi-processor systems-on-chip (MP-SoC) can provide a high degree of flexibility and represent the most efficient architectural solution for supporting multimedia applications, characterized by the request for highly parallel computation. As a consequence, tools for the simulation of these systems are needed for the design stage, with the distinctive requirement of simulation speed, accuracy and capability to support design space exploration. We developed a complete simulation platform for a MP-SoC called MP-ARM, based on SystemC as modelling and simulation environment, and including models for processors, the AMBA bus compliant communication architecture, memory models and support for parallel programming. A fully operating linux version for embedded systems has been ported on this platform, and a cross-toolchain has been developed as well. Our MP simulation environment turns out to be a powerful tool for the MP-SOC design stage. As an example thereof, we use our tool to evaluate the impact on system performance of architectural parameters and of bus arbitration policies, showing that the effectiveness of a particular system configuration strongly depends on the application domain and the generated traffic profile.
compilers, architecture, and synthesis for embedded systems | 2004
Federico Angiolini; Francesco Menichelli; Alberto Ferrero; Luca Benini; Mauro Olivieri
ScratchPad Memories (SPMs) are commonly used in embedded systems because they are more energy-efficient than caches and enable tighter application control on the memory hierarchy. Optimally mapping code and data to SPMs is, however, still a challenge. This paper proposes an optimal scratchpad mapping approach for code segments, which has the distinctive characteristic of working directly on application binaries, thus requiring no access to either the compiler or the application source code - a clear advantage for legacy or proprietary, IP-protected applications.The mapping problem is solved by means of a Dynamic Programming algorithm applied to the execution traces of the target application. The algorithm is able to find the optimal set of instructions blocks to be moved into a dedicated SPM, either minimizing energy consumption or execution times. A patching tool, which can use the output of the optimal mapper, modifies the binary of the application and moves the relevant portions of its code segments to memory locations inside of the SPM.
IEEE Transactions on Computers | 1996
A. De Gloria; Mauro Olivieri
Addition techniques are divided into fixed-time and variable-time ones. While variable time techniques can achieve log/sub 2/(N) average addition time for N-bit operands, the hardware overhead have always made fixed-time adders preferable, such as Carry Lookahead and Carry Select. We present a new variable-time addition technique whose average delay is much lower than log/sub 2/(N) and whose overhead is lower than the one of a CLA adder. The new approach is made feasible by a proper application of VLSI dynamic logic design. We show the mathematical proof, the logic implementation, and the VLSI realization of the new adder. We report circuit simulation results and their comparison with the analytical model.
international symposium on microarchitecture | 1993
A. Costa; Alessandro De Gloria; P. Faraboschi; Mauro Olivieri
Instruction-level parallelism in a single stream of code for non-numerical applications has been the subject of many recent researches. This work extends the analysis to symbolic applications described with logic programming. In particular, the authors analyze the effects on performance of speculative execution, memory alias disambiguation, renaming and flow prediction. The obtained results indicate that one can reach a sustained parallelism of 4 (comparable with imperative languages), with the proper optimizations. The authors also show a comparison between static and dynamic scheduled approaches, outlining the conditions under which a dynamic solution can reach substantial improvements over a static one. In this way, they point out some important optimizations and parameters of a dynamic scheduling approach, indicating a guideline for future architectural implementations. >
IEEE Transactions on Very Large Scale Integration Systems | 2005
Mauro Olivieri; Giuseppe Scotti; Alessandro Trifiletti
This work presents a novel approach to optimize digital integrated circuits yield referring to speed, dynamic power and leakage power constraints. The method is based on process parameter estimation circuits and active control of body bias performed by an on-chip digital controller. The associated design flow allows us to quantitatively predict the impact of the method on the expected yield in a specific design. We present the architecture scheme, the theoretical foundation, the estimation circuits used, and two application case studies, referring to an industrial 0.13-/spl mu/m CMOS process data. The approach results to be remarkably effective at high operating temperature. In the presented case study, initial yields below 14% are improved to 86% by using a single controller and a single set of estimation circuits per die.
international symposium on microarchitecture | 1997
A. Costa; A. De Gloria; F. Giudici; Mauro Olivieri
We propose an architecture dedicated mainly to medium-range applications that demand computational power combined with low cost for the resulting hardware system (chip and board). Our architecture is a 16-bit processor with dedicated instructions and hardware for efficient support of fuzzy logic. To make the architecture effective for control applications developed with a traditional approach or with fuzzy logic, we equipped the processor with a microcontrollers general features. Our design accounts for application characteristics to provide efficient hardware support for fuzzy logic. To achieve this we first analyzed fuzzy control algorithms and derived a general model for fuzzy computation. In defining the model, we considered the large spectrum of possible inference methods, fuzzification and defuzzification mechanisms, and the operators used in control applications. On this basis, we defined the instruction set that supports this computational model and a proper architectural solution. We tested the system (composed of the software model and its hardware support) by simulating different sets of general-purpose and fuzzy control benchmarks.
IEEE Transactions on Computers | 2004
Luca Benini; Francesco Menichelli; Mauro Olivieri
Compression of executable code in embedded microprocessor systems, used in the past mainly to reduce the memory footprint of embedded software, is gaining interest for the potential reduction in memory bus traffic and power consumption. We propose three new schemes for code compression, based on the concepts of static (using the static representation of the executable) and dynamic (using program execution traces) entropy and compare them with a state-of-the-art compression scheme, IBMs CodePack. The proposed schemes are competitive with CodePack for static footprint compression and achieve superior results for bus traffic and energy reduction. Another interesting outcome is that static compression is not directly related to bus traffic reduction, yet there is a trade off between static compression and dynamic compression, i.e., traffic reduction.
IEEE Transactions on Very Large Scale Integration Systems | 2001
Mauro Olivieri
This paper presents a novel variable-latency multiplier architecture, suitable for implementation as a self-timed multiplier core or as a fully synchronous multicycle multiplier core. The architecture combines a second-order Booth algorithm with a split carry save array pipelined organization, incorporating multiple row skipping and completion-predicting carry-select dual adder. The paper reports the architecture and logic design, CMOS circuit design and performance evaluation. In 0.35 /spl mu/m CMOS, the expected sustainable cycle time for a 32-bit synchronous implementation is 2.25 ns. Instruction level simulations estimate 54% single-cycle and 46% two-cycle operations in SPEC95 execution. Using the same CMOS process, the 32-bit asynchronous implementation is expected to reach an average 1.76 ns throughput and 3.48 ns latency in SPEC95 execution.
design, automation, and test in europe | 2009
Francesco Paterna; Luca Benini; Andrea Acquaviva; Francesco Papariello; Giuseppe Desoli; Mauro Olivieri
In deep submicron designs of MultiProcessor Systems-on-Chip (MPSoC) architectures, uncompensated within-die process variations and aging effects will lead to an increasing uncertainty and unbalancing of expected core lifetimes. In this paper we present an adaptive workload allocation strategy for run-time compensation of variations- and aging-induced unbalanced core lifetimes by means of core activity duty cycling. The proposed techniques regulates the percentage of idle time on short-expected-life cores to meet the platform lifetime target with minimum performance degradation. Experiments have been conducted on a multiprocessor simulator of a next-generation industrial MPSoC platform for multimedia applications made of a general purpose processor and programmable accelerators.
Microelectronics Journal | 2014
Zia Abbas; Mauro Olivieri
Leakage estimation is an important step in nano-scale technology digital design flows. While reliable data exist on leakage trends with bulk CMOS technology scaling in stand-alone devices and circuits, there is a lack of public domain results on the effect of scaling on leakage power consumption for a complete standard cell set. We present an analysis on a standard cell library applying a logic-level estimation model, supported by SPICE BSIM4 comparison. The logic-level model speedup over SPICE is >10^3 with average accuracy below 1% error. We therefore explore the effects of scaling on the whole standard cell set with respect to different leakage mechanisms (sub-threshold, body, gate) and to input pattern dependence. While body leakage appears to be dominant, sub-threshold leakage is expected to increase more than other components with scaling. Detailed data of the whole analysis are reported for use in further research on leakage aware digital design.