Martino Ruggiero
University of Bologna
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Martino Ruggiero.
international conference on computer aided design | 2010
Arvind Sridhar; Alessandro Vincenzi; Martino Ruggiero; Thomas Brunschwiler; David Atienza
Three dimensional stacked integrated circuits (3D ICs) are extremely attractive for overcoming the barriers in interconnect scaling, offering an opportunity to continue the CMOS performance trends for the next decade. However, from a thermal perspective, vertical integration of high-performance ICs in the form of 3D stacks is highly demanding since the effective areal heat dissipation increases with number of dies (with hotspot heat fluxes up to 250W/cm2) generating high chip temperatures. In this context, inter-tier integrated microchannel cooling is a promising and scalable solution for high heat flux removal. A robust design of a 3D IC and its subsequent thermal management depend heavily upon accurate modeling of the effects of liquid cooling on the thermal behavior of the IC during the early stages of design. In this paper we present 3D-ICE, a compact transient thermal model (CTTM) for the thermal simulation of 3D ICs with multiple inter-tier microchannel liquid cooling. The proposed model is compatible with existing thermal CAD tools for ICs, and offers significant speed-up (up to 975x) over a typical commercial computational fluid dynamics simulation tool while preserving accuracy (i.e., maximum temperature error of 3.4%). In addition, a thermal simulator has been built based on 3D-ICE, which is capable of running in parallel on multicore architectures, offering further savings in simulation time and demonstrating efficient parallelization of the proposed approach.
design, automation, and test in europe | 2006
Martino Ruggiero; Alessio Guerri; Davide Bertozzi; Francesco Poletti; Michela Milano
This paper proposes a complete allocation and scheduling framework, where an MPSoC virtual platform is used to accurately derive input parameters, validate abstract models of system components and assess constraint satisfaction and objective function optimization. The optimizer implements an efficient and exact approach to allocation and scheduling based on problem decomposition. The allocation subproblem is solved through integer programming while the scheduling one through constraint programming. The two solvers can interact by means of no-good generation, thus building an iterative procedure which has been proven to converge to the optimal solution. Experimental results show significant speedups w.r.t. pure IP and CP exact solution strategies as well as high accuracy with respect to cycle accurate functional simulation. A case study further demonstrates the practical viability of our framework for real-life systems and applications
ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013
Daniele Bortolotti; Christian Pinto; Andrea Marongiu; Martino Ruggiero; Luca Benini
Driven by flexibility, performance and cost constraints of demanding modern applications, heterogeneous System-on-Chip (SoC) is the dominant design paradigm in the embedded system computing domain. SoC architecture and heterogeneity clearly provide a wider power/performance scaling, combining high performance and power efficient general-purpose cores along with massively parallel many-core-based accelerators. Besides the complex hardware, generally these kinds of platforms host also an advanced software ecosystem, composed by an operating system, several communication protocol stacks, and various computational demanding user applications. The necessity to efficiently cope with the hugeHW/SW design space provided by this scenario makes clearly full-system simulator one of the most important design tools. We present in this paper a new emulation framework, called Virtual SoC, targeting the full-system simulation of massively parallel heterogeneous SoCs.
International Journal of Parallel Programming | 2008
Martino Ruggiero; Alessio Guerri; Davide Bertozzi; Michela Milano; Luca Benini
The problem of allocating and scheduling precedence-constrained tasks on the processors of a distributed real-time system is NP-hard. As such, it has been traditionally tackled by means of heuristics, which provide only approximate or near-optimal solutions. This paper proposes a complete allocation and scheduling framework, and deploys an MPSoC virtual platform to validate the accuracy of modelling assumptions. The optimizer implements an efficient and exact approach to the mapping problem based on a decomposition strategy. The allocation subproblem is solved through Integer Programming (IP) while the scheduling one through Constraint Programming (CP). The two solvers interact by means of an iterative procedure which has been proven to converge to the optimal solution. Experimental results show significant speed-ups w.r.t. pure IP and CP exact solution strategies as well as high accuracy with respect to cycle-accurate functional simulation. Two case studies further demonstrate the practical viability of our framework for real-life applications.
international conference on computer design | 2005
Martino Ruggiero; Andrea Acquaviva; Davide Bertozzi; Luca Benini
In this paper, we address the problem of selecting the optimal number of processing cores and their operating voltage/frequency for a given workload, to minimize overall system power under application-dependent QoS constraints. Selecting the optimal system configuration is non-trivial, since it depends on task characteristics and system-level interaction effects among the cores. For this reason, our QoS-driven methodology for power aware partitioning and frequency selection is based on functional, cycle-accurate simulation on a virtual platform environment. The methodology, being application-specific, is demonstrated on the DES (data encryption system) algorithm, representative of a wider class of streaming applications with independent input data frames and regular work-load.
design, automation, and test in europe | 2013
Jungsoo Kim; Martino Ruggiero; David Atienza; Marcel Lederberger
Server consolidation plays a key role to mitigate the continuous power increase of datacenters. The recent advent of scale-out applications (e.g., web search, MapReduce, etc.) necessitate the revisit of existing server consolidation solutions due to distinctively different characteristics compared to traditional high-performance computing (HPC), i.e., user interactive, latency critical, and operations on large data sets split across a number of servers. This paper presents a power saving solution for datacenters that especially targets the distinctive characteristics of the scale-out applications. More specifically, we take into account correlation information of core utilization among virtual machines (VMs) in server consolidation to lower actual peak server utilization. Then, we utilize this reduction to achieve further power savings by aggressively-yet-safely lowering the server operating voltage and frequency level. We have validated the effectiveness of the proposed solution using 1) multiple clusters of real-life scale-out application workloads based web search and 2) utilization traces obtained from real datacenter setups. According to our experiments, the proposed solution provides up to 13.7% power savings with up to 15.6% improvement of Quality-of-Service (QoS) compared to existing correlation-aware VM allocation schemes for datacenters.
design, automation, and test in europe | 2012
Ahmed Yasir Dogan; Jeremy Constantin; Martino Ruggiero; Andreas Burg; David Atienza
Personal health monitoring systems can offer a cost-effective solution for human healthcare. To extend the lifetime of health monitoring systems, we propose a near-threshold ultra-low-power multi-core architecture featuring low-power cores, yet capable of executing biomedical applications, with multiple instruction and data memories, tightly coupled through flexible crossbar interconnects. This architecture also includes broadcasting mechanisms for the data and instruction memories to optimize system energy consumption by tailoring memory sharing to the target application. Moreover, the architecture enables power gating of the unused memory banks to lower leakage power. Our experimental results show that compared to the state-of-the-art, the proposed architecture achieves 39.5% power savings at high workload requirements (637 MOps/s), and 38.8% savings at low workload requirements (5 kOps/s), whereby leakage power consumption dominates.
international conference on high performance computing and simulation | 2012
Jungsoo Kim; Martino Ruggiero; David Atienza
Free cooling, i.e., directly using outside cold air and/or water to cool down datacenters, can provide significant power savings of datacenters. However, due to the limited cooling capability, which is tightly coupled with climate conditions, free cooling is currently used only in limited locations (e.g., North Europe) and periods of the year. Moreover, the applicability of free cooling is further restricted along with the conservative assumption on workload characteristics and the virtual machine (VM) consolidation technique as they require to provision higher cooling capability. This paper presents a dynamic power management scheme, which extends the applicability of free cooling by judiciously consolidating VMs exploiting time-varying workload characteristics of datacenter as well as climate conditions, in order to minimize the power consumption of the entire datacenter while satisfying service-level agreement (SLA) requirements. Additionally, we propose the use of a receding horizon control scheme in order to prevent frequent cooling mode transitions. Experimental results show that the proposed solution provides up to 25.7% power savings compared to conventional free cooling decision schemes, which uses free cooling only when the outside temperature is lower than predefined threshold temperature.
embedded software | 2008
Martino Ruggiero; Andrea Bartolini; Luca Benini
Almost every modern portable handheld device is equipped with a coloured LCD display. The backlight of the LCD accounts for a significant percentage of the total energy consumption. Substantial energy savings can be achieved by dynamically adapting backlight intensity levels on such low-power portable devices. In this paper, we present the DBS4video framework which allows dynamic scaling of the backlight with a negligible impact on QoS for video streaming applications. DBS4video exploits in a smart and efficient way the hardware image processing unit integrated in almost every new multimedia application processor to implement a hardware assisted image compensation. The proposed approach overcomes CPU-intensive techniques by saving system power without requiring either a dedicated display technology or hardware modification. We introduce also a new image processing kernel based on multiple histograms collection for a single frame. We provide a real implementation of the proposed framework on a Freescale application development board based on the i.MX31 processor. We carried out a full characterization of the overall system power consumption versus QoS.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2012
Arvind Sridhar; Alessandro Vincenzi; Martino Ruggiero; David Atienza
With the rising challenges in heat removal in integrated circuits (ICs), the development of thermal-aware computing architectures and run-time management systems has become indispensable to the continuation of IC design scaling. These thermal-aware design technologies of the future strongly depend on the availability of efficient and accurate means for thermal modeling and analysis. These thermal models must have not only the sufficient accuracy to capture the complex mechanisms that regulate thermal diffusion in ICs, but also a level of abstraction that allows for their fast execution for design space exploration. In this paper, we propose an innovative thermal modeling approach for full-chips that can handle the scalability problem of transient heat flow simulation in large 2-D/3-D multiprocessor ICs. This is achieved by parallelizing the computation-intensive task of transient temperature tracking using neural networks and exploiting the computational power of massively parallel graphics processing units. Our results show up to 35× run-time speedup compared to state-of-the-art IC thermal simulation tools while keeping the error lower than 1°C. Speedups scale with the size of the 3-D multiprocessor ICs and our proposed method serves as a valuable design space exploration tool.