Mirko Loghi
University of Udine
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mirko Loghi.
design, automation, and test in europe | 2004
Mirko Loghi; Federico Angiolini; Davide Bertozzi; Luca Benini; Roberto Zafalon
This work focuses on communication architecture analysis for multi-processor systems-on-chips (MPSoCs), and it leverages a SystemC-based platform to simulate a complete multi-processor system at the cycle-accurate and signal-accurate level. These features allow to stimulate the communication sub-system with functional traffic generated by real applications running on top of a configurable number of ARM processors. This opens up the possibility for communication infrastructure exploration and for the investigation of its impact on system performance at the highest level of accuracy. Our simulation environment proved capable of a detailed comparative analysis between two industry-standard communication architectures, under realistic workloads and different system configurations, pointing out the impact of fine grained architectural mismatches on macroscopic performance differences.
great lakes symposium on vlsi | 2004
Mirko Loghi; Massimo Poncino; Luca Benini
Developing energy-aware software for multiprocessor systems-on-chip (MPSoCs) is a difficult task, which requires the knowledge of the distribution of the power consumption among several heterogeneous devices (cores, memories, busses, etc.). In this work we analyze the power breakdowns of power consumption for a complete MPSoC platform, under several application workloads and operating conditions. We leverage a complete-system simulation platform with accurate power models for all key hardware modules. Our analysis shows that caches and system interconnect dominate in the power breakdown, pointing out how software locality is meaningful not only for performance but also for energy optimization.
ACM Transactions in Embedded Computing Systems | 2006
Mirko Loghi; Massimo Poncino; Luca Benini
Shared memory is a common interprocessor communication paradigm for single-chip multiprocessor platforms. Snoop-based cache coherence is a very successful technique that provides a clean shared-memory programming abstraction in general-purpose chip multiprocessors, but there is no consensus on its usage in resource-constrained multiprocessor systems on chips (MPSoCs) for embedded applications. This work aims at providing a comparative energy and performance analysis of cache-coherence support schemes in MPSoCs. Thanks to the use of a complete multiprocessor simulation platform, which relies on accurate technology-homogeneous power models, we were able to explore different cache-coherent shared-memory communication schemes for a number of cache configurations and workloads.
IEEE Transactions on Computers | 2010
Mirko Loghi; Olga Golubeva; Enrico Macii; Massimo Poncino
Partitioning a memory into multiple blocks that can be independently accessed is a widely used technique to reduce its dynamic power. For embedded systems, its benefits can be even pushed further by properly matching the partition to the memory access patterns. When leakage energy comes into play, however, idle memory blocks must be put into a proper low-leakage sleep state to actually save energy when not accessed. In this case, the matching becomes an instance of the power management problem, because moving to and from this sleep state requires additional energy. In this work, we propose an effective solution to the problem of the leakage-aware partitioning of a memory into disjoint subblocks; in particular, we target scratchpad memories, which are commonly used in some embedded systems as a replacement for caches. We show that, although the solution space is extremely large (for a N--block partition, all the combinations of N-1 address boundaries) and nonconvex, it is possible to prove a nontrivial property that considerably reduces the number of partition boundaries to be enumerated, therefore, making exhaustive exploration feasible. We are thus able to provide an optimal solution to the leakage-aware partitioning problem. Experiments on a different sets of embedded applications have shown that total energy savings larger than 60 percent on average can be obtained, with a marginal overhead in execution time, thanks to an effective implementation of the low-leakage sleep state.
IEEE Transactions on Computers | 2007
Francesco Poletti; Antonio Poggiali; Davide Bertozzi; Luca Benini; Pol Marchal; Mirko Loghi; Massimo Poncino
In todays multiprocessor SoCs (MPSoCs), parallel programming models are needed to fully exploit hardware capabilities and to achieve the 100 Gops/W energy efficiency target required for ambient intelligence applications. However, mapping abstract programming models onto tightly power-constrained hardware architectures imposes overheads which might seriously compromise performance and energy efficiency. The objective of this work is to perform a comparative analysis of message passing versus shared memory as programming models for single-chip multiprocessor platforms. Our analysis is carried out from a hardware-software viewpoint: we carefully tune hardware architectures and software libraries for each programming model. We analyze representative application kernels from the multimedia domain, and identify application-level parameters that heavily influence performance and energy efficiency. Then, we formulate guidelines for the selection of the most appropriate programming model and its architectural support
design, automation, and test in europe | 2007
Olga Golubeva; Mirko Loghi; Massimo Poncino; Enrico Macii
Partitioning a memory into multiple blocks that can be independently accessed is a widely used technique to reduce its dynamic power. For embedded systems, its benefits can be even pushed further by properly matching the partition to the memory access patterns. When leakage energy comes into play, however, idle memory blocks must be put into a proper low-leakage sleep state to actually save energy when not accessed. In this case, the matching becomes an instance of power management problem, because moving to and from this sleep state requires additional energy. In this work, we propose an explorative solution to the problem of leakage-aware partitioning of a memory into disjoint sub-blocks. In particular, we target scratchpad memories, which are commonly used in some embedded systems as a replacement of caches. We show that the total energy (dynamic and static) cost function yields a non-convex partitioning space, making smart exploration the only viable option; we propose an effective randomized search in the solution space which has very good match with the results of exhaustive exploration, when this is feasible. Experiments on a different sets of embedded applications has shown that total energy savings larger than 60% on average can be obtained, with a marginal overhead in execution time, thanks to an effective implementation of the low-leakage sleep state.
design, automation, and test in europe | 2011
Andrea Calimera; Mirko Loghi; Enrico Macii; Massimo Poncino
Conventional power management knobs such as voltage scaling or power gating have been shown to have a beneficial effect on the aging phenomena caused Negative Bias Temperature Instability (NBTI). Such a benefit can be especially exploited in SRAM memories, which are particularly sensitive to NBTI effects: given their symmetric structure, they cannot in fact take advantage of value-dependent recovery. We propose an architectural solutions that is based on the idea of partitioning a memory into multiple banks of identical size. While this organization has been widely used for reducing both dynamic and static power, its exploitation for aging benefits requires proper management of the existing idleness of the various banks. This can be achieved by means of a sort of time-varying addressing scheme in which addresses are mapped to different banks over time in such a way that the idleness is uniformly distributed over all the banks. Experimental analysis shows that it is possible to simultaneously reducing leakage power and aging in caches, with minimal overhead and without modifying the internal structure of the SRAM arrays.
international symposium on low power electronics and design | 2010
Andrea Calimera; Mirko Loghi; Enrico Macii; Massimo Poncino
Previous works have shown that the traditional implementations of power management (i.e., using power gating or voltage scaling) can also mitigate the aging effect induced by Negative Bias Temperature Instability (NBTI), due to the partial recovery that occurs during the idle intervals used by power management. However, such a potential has been exploited only partially because of the different nature of energy and aging: as a performance figure, aging is affected by the worst idleness pattern. Therefore, large potential energy savings usually turn into limited aging reductions. We address this problem in the context of caches, for which idleness is related to their access pattern. We propose a dynamic indexing scheme, in which the cache indexing function is changed over time in order to uniformly distribute the idleness over all the cache lines. In this way it is possible to fully use the leakage optimization potential and to extend the lifetime of a cache. Experimental analysis shows that it is possible to obtain caches that are effectively aging-free, without any penalty in leakage energy reduction.
great lakes symposium on vlsi | 2005
Mirko Loghi; Martin Letis; Luca Benini; Massimo Poncino
The performance of the various cache coherence protocols proposed in the literature have been extensively analyzed in the context of high-performance multi-processor systems.A similar analysis for Multi-Processor Systems-on-Chips (MP-SoCs), where energy is at least as important as performace, and for which strict constraints on hardware and software resources do exist, has not been done yet.This work provides an effort in that sense, showing energy/performance tradeoffs for different snoop-based protocols on a realistic MPSoC architecture. The analysis leverage a multi-processor simulation platform, augmented with accurate power models, that allows cycle-accurate simulations.Our analysis show that (i) cache write policy is actually more important than the actual cache coherence protocol, and (ii) matching the programming model and style to the architecture may have dramatic effects on the energy and performance of the system.
great lakes symposium on vlsi | 2006
Franco Fummi; Giovanni Perbellini; Mirko Loghi; Massimo Poncino
Modular design is an important requirement in modern embedded system design flows because of the widespread acceptance of new paradigms such as IP core reuse and platform-based design. Co-simulation frameworks must thus support modular design, since programmable devices, ad-hoc HW components, and the interconnect infrastructure must be easily interchangeable in order to allow design exploration while keeping the SW portion unchanged or only marginally changed. The proposed co-simulation framework implements such a modular approach to co-simulation by means of a novel paradigm in which HW models can be modified on the fly by keeping the SW parts unchanged. This is achieved through an ISS-centric co-simulation strategy in which modularity is provided in terms of (i) the replacement of HW components thanks to the use of a common interface based on the device address space, or (ii) the use of different ISSs, thanks to a re-configurable simulator. We demonstrate our approach onto an industrial-strength embedded application, showing that the proposed co-simulation strategy provides both high speed and accuracy.