Gianluca Palermo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gianluca Palermo is active.

Explore More

Publication

Featured researches published by Gianluca Palermo.

international conference on information technology coding and computing | 2005

AES power attack based on induced cache miss and countermeasure

Guido Bertoni; Vittorio Zaccaria; Luca Breveglieri; Matteo Monchiero; Gianluca Palermo

This paper presents a new attack against a software implementation of the Advanced Encryption Standard. The attack aims at flushing elements of the SBOX from the cache, thus inducing a cache miss during the encryption phase. The power trace is then used to detect when the cache miss occurs; if the miss happens in the first round of the AES then the information can be used to recover part of the secret key. The attack has been simulated using the Wattch simulation framework and a simple software implementation of AES (using a single table for the SBOX). The attack can be easily extended to more sophisticated versions of AES with more than one table. Eventually, we present a simple countermeasure which does not require randomization.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2009

ReSPIR: A Response Surface-Based Pareto Iterative Refinement for Application-Specific Design Space Exploration

Gianluca Palermo; Cristina Silvano; Vittorio Zaccaria

Application-specific multiprocessor systems-on-chip (MPSoCs) are usually designed by using a platform-based approach, where a wide range of customizable parameters can be tuned to find the best tradeoff in terms of the selected figures of merit (such as energy, delay, and area). This optimization phase is called design space exploration (DSE), and it usually consists of a multiobjective optimization problem with multiple constraints. So far, several heuristic techniques have been proposed to address the DSE problem for MPSoC, but they are not efficient enough for managing the application-specific constraints and for identifying the Pareto front. In this paper, an efficient DSE methodology for application-specific MPSoC is proposed. The methodology is efficient in the sense that it is capable of finding a set of good candidate architecture configurations by minimizing the number of simulations to be executed. The methodology combines the design of experiments (DoEs) and response surface modeling (RSM) techniques for managing system-level constraints. First, the DoE phase generates an initial plan of experiments used to create a coarse view of the target design space to be explored by simulations. Then, a set of RSM techniques is used to refine the simulation-based exploration by exploiting the application-specific constraints to identify the maximum number of feasible solutions. To trade off the accuracy and efficiency of the proposed techniques, a set of experimental results for the customization of a symmetric shared-memory on-chip multiprocessor with actual workloads has been reported in this paper.

IEEE Transactions on Computers | 2008

Secure Memory Accesses on Networks-on-Chip

Leandro Fiorin; Gianluca Palermo; Slobodan Lukovic; Valerio Catalano; Cristina Silvano

Security is gaining increasing relevance in the development of embedded devices. Towards a secure system at each level of design, this paper addresses security aspects related to network-on-chip (NoC) architectures, foreseen as the communication infrastructure of next-generation embedded devices. In the context of NoC-based multiprocessor systems, we focus on the topic, not yet thoroughly faced, of data protection. In this paper, we present a secure NoC architecture composed of a set of data protection units (DPUs) implemented within the network interfaces. The run-time configuration of the programmable part of the DPUs is managed by a central unit, the network security manager (NSM). The DPU, similar to a firewall, can check and limit the access rights (none, read, write, or both) of processors accessing data and instructions in a shared memory - in particular distinguishing between the operating roles (supervisor/user and secure/unsecure) of the processing elements. We explore different alternative implementations for the DPU and demonstrate how this unit does not affect the network latency if the memory request has the appropriate rights. We also focus on the dynamic updating of the DPUs to support their utilization in dynamic environments, and on the utilization of authentication techniques to increase the level of security.

power and timing modeling optimization and simulation | 2004

PIRATE: A Framework for Power/Performance Exploration of Network-on-Chip Architectures

Gianluca Palermo; Cristina Silvano

In this paper, we address the problem of high-level exploration of Network-on-Chip (NoC) architectures to early evaluate power/performance trade-offs. The main goal of this work is to propose a methodology supported by a design framework (namely, PIRATE) to generate and to simulate a configurable NoC–IP core for the power/performance exploration of the on-chip interconnection network. The NoC–IP core is composed of a set of parameterized modules, such as interconnection elements and switches, to form different on-chip micro-network topologies. The proposed framework has been applied to explore several network topologies by varying the workload and to analyze a case study designed for cryptographic hardware acceleration in high performance web server systems.

ieee computer society annual symposium on vlsi | 2010

MULTICUBE: Multi-objective Design Space Exploration of Multi-core Architectures

Cristina Silvano; William Fornaciari; Gianluca Palermo; Vittorio Zaccaria; Fabrizio Castro; Marcos Martinez; Sara Bocchio; Roberto Zafalon; Prabhat Avasare; Geert Vanmeerbeeck; Chantal Ykman-Couvreur; Maryse Wouters; Carlos Kavka; Luka Onesti; Alessandro Turco; Umberto Bondi; Giovanni Mariani; Hector Posadas; Eugenio Villar; Chris Wu; Fan Dongrui; Zhang Hao; Tang Shibin

Technology trends enable the integration of many processor cores in a System-on-Chip (SoC). In these complex architectures, several architectural parameters can be tuned to find the best trade-off in terms of multiple metrics such as energy and delay. The main goal of the MULTICUBE project consists of the definition of an automatic Design Space Exploration framework to support the design of next generation many-core architectures.

design, automation, and test in europe | 2010

An industrial design space exploration framework for supporting run-time resource management on multi-core systems

Giovanni Mariani; Prabhat Avasare; Geert Vanmeerbeeck; Chantal Ykman-Couvreur; Gianluca Palermo; Cristina Silvano; Vittorio Zaccaria

Current multi-core design methodologies are facing increasing unpredictability in terms of quality due to the actual diversity of the workloads that characterize the deployment scenario. To this end, these systems expose a set of dynamic parameters which can be tuned at run-time to achieve a specified Quality of Service (QoS) in terms of performance. A run-time manager operating system module is in charge of matching the specified QoS with the available platform resources by manipulating the overall degree of task-level parallelism of each application as well as the frequency of operation of each of the system cores.

international conference on hardware/software codesign and system synthesis | 2008

A security monitoring service for NoCs

Leandro Fiorin; Gianluca Palermo; Cristina Silvano

As computing and communications increasingly pervade our lives, security and protection of sensitive data and systems are emerging as extremely important issues. Networks-on-Chip (NoCs) have appeared as design strategy to cope with the rapid increase in complexity of Multiprocessor Systems-on-Chip (MPSoCs), but only recently research community have addressed security on NoC-based architectures. In this paper, we present a monitoring system for NoC based architectures, whose goal is to help detect security violations carried out against the system. Information collected are sent to a central unit for efficiently counteracting actions performed by attackers. We detail the design of the basic blocks and analyse overhead associated with the ASIC implementation of the monitoring system, discussing type of security threats that it can help detect and counteract.

IEEE Transactions on Very Large Scale Integration Systems | 2006

Efficient Synchronization for Embedded On-Chip Multiprocessors

Matteo Monchiero; Gianluca Palermo; Cristina Silvano; Oreste Villa

This paper investigates optimized synchronization techniques for shared memory on-chip multiprocessors (CMPs) based on network-on-chip (NoC) and targeted at future mobile systems. The proposed solution is based on the idea of locally performing synchronization operations requiring continuous polling of a shared variable, thus, featuring large contentions (e.g., spin locks and barriers). A hardware (HW) module, the synchronization-operation buffer (SB), has been introduced to queue and to manage the requests issued by the processors. By using this mechanism, we propose a spin lock implementation requiring a constant number of network transactions and memory accesses per lock acquisition. The SB also supports an efficient implementation of barriers. Experimental validation has been carried out by using GRAPES, a cycle-accurate performance/power simulation platform for multiprocessor systems-on-chip (MPSoCs). Two different architectures have been explored to prove that the proposed approach is effective independently from caches and coherence schemes adopted. For an eight-processor target architecture, we show that the SB-based solution achieves up to 50% performance improvement and 30% energy saving with respect to synchronization based on the caching of the synchronization variables and directory-based coherence protocol. Furthermore, we prove the scalability of the proposed approach when the number of processors increases

Journal of Systems Architecture | 2007

Exploration of distributed shared memory architectures for NoC-based multiprocessors

Matteo Monchiero; Gianluca Palermo; Cristina Silvano; Oreste Villa

Multiprocessor system-on-chip (MP-SoC) platforms represent an emerging trend for embedded multimedia applications. To enable MP-SoC platforms, scalable communication-centric interconnect fabrics, such as networks-on-chip (NoCs), have been recently proposed. The shared memory represents one of the key elements in designing MP-SoCs to provide data exchange and synchronization support. This paper focuses on the energy/delay exploration of a distributed shared memory architecture, suitable for low-power on-chip multiprocessors based on NoC. A mechanism is proposed for the data allocation on the distributed shared memory space, dynamically managed by an on-chip hardware memory management unit (HwMMU). Moreover, the exploitation of the HwMMU primitives for the migration, replication, and compaction of shared data is discussed. Experimental results show the impact of different distributed shared memory configurations for a selected set of parallel benchmark applications from the power/-performance perspective. Furthermore, a case study for a graph exploration algorithm is discussed, accounting for the effects of the core mapping and the network topology on energy and performance at the system level.

Iet Computers and Digital Techniques | 2011

Linking run-time resource management of embedded multi-core platforms with automated design-time exploration

Chantal Ykman-Couvreur; Prabhat Avasare; Giovanni Mariani; Gianluca Palermo; Cristina Silvano; Vittorio Zaccaria

Nowadays, owing to unpredictable changes of the environment and workload variation, optimally running multiple applications in terms of quality, performance and power consumption on embedded multi-core platforms is a huge challenge. A lightweight run-time manager, linked with an automated design-time exploration and incorporated in the host processor of the platform, is required to dynamically and efficiently configure the applications according to the available platform resources (e.g. processing elements, memories, communication bandwidth), for minimising the cost (e.g. power consumption), while satisfying the constraints (e.g. deadlines). This study presents a flow linking a design-time design space explorer, coupled with platform simulators at two abstraction levels, with a fast and lightweight priority-based heuristic integrated in the run-time manager to select near-optimal application configurations. To illustrate its feasibility and the very low complexity of the run-time selection, the proposed flow is used to manage the processors and clock frequencies of a multiple-stream MPEG4 encoder chip dedicated to automotive cognitive safety applications.

Explore More