Sandro Penolazzi
Royal Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sandro Penolazzi.
digital systems design | 2006
Sandro Penolazzi; Axel Jantsch
We propose a power model for the Nostrum NoC. For this purpose an empirical power model of links and switches has been formulated and validated with the synopsys power compiler. The model, which from now on will be called Nos-HPM (Nostrum high-level power model) allows a fast power analysis and is accurate within 5%. System simulations with Nos-HPM run up to 500 times faster than with power compiler for a 4 times 4 network. We find a maximum power consumption of 0.7 W for a 4 times 4 mesh and 3.5 W for an 8 times 8 mesh, both implemented in 0.18mum UPC CMOS technology. In the worst case the average energy per cycle for a 128-bit packet is 508 pJ, while it is 20 pJ for a payload byte. The power consumption of all the links is equivalent or slightly higher than the power consumption of all the switches. A comparison between our results and some related work is also presented
digital systems design | 2009
Sandro Penolazzi; Luca Bolognino; Ahmed Hemani
We present a general methodology to implement a processor energy model, based on instruction-level characterization, and we apply it to a SPARC-based Leon3 processor. The model is characterized by ...
Archive | 2012
Axel Jantsch; Xiaowen Chen; Abdul Naeem; Yuang Zhang; Sandro Penolazzi; Zhonghai Lu
The memory organization and the management of the memory space is a critical part of every NoC based platform design. We propose a Data Management Engine (DME), that is a block of programmable hardware and part of every processing element. It off-loads the processing element (CPU, DSP, etc.) by managing the memory space, memory access and the communication over the on-chip network. The DME’s main functions are virtual address translation, private and shared memory management, cache coherence protocol, support for memory consistency models, synchronization and protection mechanisms for shared memory communication. The DME is fully programmable and configurable thus allowing for customized support for high level data management functions such as dynamic memory allocation and abstract data types. This chapter describes the main concepts, design and functionality of the DME and presents case studies illustrating its usage and performance.
international conference on vlsi design | 2009
Sandro Penolazzi; Ahmed Hemani; Luca Bolognino
We present a high-level methodology for efficient and accurate estimation of energy and performance in SoCs at Functional Untimed Level. We then validate the proposed method against gate level for accuracy and against TLM-PV for speed. We show that the method is within 15 % of gate-level accuracy and in average 28x faster than TLM-PV, for the benchmark applications selected.
design, automation, and test in europe | 2010
Sandro Penolazzi; Ingo Sander; Ahmed Hemani
We present a high-level method for rapidly and accurately estimating energy and performance overhead of Real-Time Operating Systems. Unlike most other approaches, which rely on Transaction-Level Modeling (TLM), we infer the information we need directly from executing the algorithmic specification, without needing to build any high-level architectural model. We distinguish two main components in our approach: first, an accurate one-time pre-characterization of the main RTOS functionalities in terms of energy and cycles; second, the development of an algorithm to rapidly predict the occurrences of such RTOS functionalities. Finally, we demonstrate the feasibility of our approach by comparing it against gate level for accuracy and against TLM for speed. We obtain a worst-case energy error of 12% against a mean speedup of 36X.
norchip | 2006
Sandro Penolazzi; Ahmed Hemani
A layered approach to estimating power consumption at the highest level of abstraction is presented. This approach is sufficiently accurate and fast enough to be used as guide for exploring the algorithmic and architectural space. The layers span from use-case level down to gate level. Speed and accuracy come from our ability to relate parameterized transactions at architectural level to switching activity at gate level and to perform architecturally-aware application-level simulation for specific or sweeps of use-cases. That enables us to recreate accurately architectural-level transactions. Additionally, we use preliminary floorplan to factor physical design aspects to improve the accuracy of our estimates. We base our work on the industry standard SPIRIT for specifying IPs and platforms. Early results of work are also presented
design, automation, and test in europe | 2011
Sandro Penolazzi; Ingo Sander; Ahmed Hemani
We present a high-level method for rapidly and accurately predicting bus contention effects on energy and performance in multi-processor SoCs. Unlike most other approaches, which rely on Transaction-Level Modeling (TLM), we infer the information we need directly from executing the algorithmic specification, without needing to build any high-level architectural model. This results in higher estimation speed and allows us to maintain our prediction results within ∼2% of gate-level estimation accuracy.
international symposium on industrial embedded systems | 2010
Sandro Penolazzi; Ingo Sander; Ahmed Hemani
We present a high-level method for rapidly and accurately estimating energy and performance cost of Real-Time Operating Systems. We investigate priority-driven scheduling and assume inter-dependent tasks competing for shared resources. Unlike most other approaches, which rely on Transaction-Level Modeling (TLM), we infer the information we need directly from executing the algorithmic specification, without needing to build any high-level architectural model. We distinguish two main components in our approach: first, an accurate one-time pre-characterization of the main RTOS functionalities in terms of energy and cycles; second, the development of an algorithm to rapidly predict the occurrences of such RTOS functionalities. Finally, we validate our approach by comparing it against gate level for accuracy and against TLM for speed. We obtain a worst-case error of 12% against a mean speedup of ∼30X.
Journal of Low Power Electronics | 2009
Sandro Penolazzi; Ahmed Hemani; Luca Bolognino
We present a high-level methodology for efficient and accurate estimation of energy and performance in SoCs. Differently from the most common approaches, which rely on Transaction-Level Modeling (TLM), we infer energy and performance figures directly from the Functional Untimed Level, by running the algorithmic specification natively on a common host machine. We then validate the proposed method against gate level for accuracy and against TLM-PV for speed. We show that the method is within 17% of gate-level accuracy and in average 28x faster than TLM-PV, for the benchmark applications selected.
VLSI-SoC 08. Rhodes, Greece. October 13-15, 2008 | 2008
Sandro Penolazzi; Mohammad Badawi; Ahmed Hemani