Simone Secchi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Simone Secchi is active.

Explore More

Publication

Featured researches published by Simone Secchi.

IEEE Embedded Systems Letters | 2010

An FPGA-Based Framework for Technology-Aware Prototyping of Multicore Embedded Architectures

Paolo Meloni; Simone Secchi; Luigi Raffo

The use of cycle-accurate software simulators as a foundation for the exploration of all the possible full-system hardware-software (hw-sw) configurations does not appear to be anymore a feasible way to handle modern embedded multicore systems complexity. In this letter, an field programmable gate array (FPGA)-based cycle-accurate hardware emulation framework is presented and proposed as a research accelerator for the exploration of complete multicore systems. The framework provides the possibility to extract from the automatically instantiated hardware-emulated system a set of metrics for the assessment of the performance and the evaluation of the architectural tradeoffs, as well as the estimation of figures of power and area consumption of a prospective application-specified integrated circuit (ASIC) implementation of the considered architecture.

Vlsi Design | 2012

Enabling fast ASIP design space exploration: an FPGA-based runtime reconfigurable prototyper

Paolo Meloni; Sebastiano Pomata; Giuseppe Tuveri; Simone Secchi; Luigi Raffo; Menno Lindwer

Application Specific Instruction-set Processors (ASIPs) expose to the designer a large number of degrees of freedom. Accurate and rapid simulation tools are needed to explore the design space. To this aim, FPGA-based emulators have recently been proposed as an alternative to pure software cycle-accurate simulator. However, the advantages of on-hardware emulation are reduced by the overhead of the RTL synthesis process that needs to be run for each configuration to be emulated. The work presented in this paper aims at mitigating this overhead, exploiting a form of software-driven platform runtime reconfiguration. We present a complete emulation toolchain that, given a set of candidate ASIP configurations, identifies and builds an overdimensioned architecture capable of being reconfigured via software at runtime, emulating all the design space points under evaluation. The approach has been validated against two different case studies, a filtering kernel and an M-JPEG encoding kernel. Moreover, the presented emulation toolchain couples FPGA emulation with activity-based physical modeling to extract area and power/energy consumption figures. We show how the adoption of the presented toolchain reduces significantly the design space exploration time, while introducing an overhead lower than 10% for the FPGA resources and lower than 0.5% in terms of operating frequency.

nature inspired cooperative strategies for optimization | 2008

A surface tension and coalescence model for dynamic distributed resources allocation in Massively Parallel Processors on-Chip

Francesca Palumbo; Danilo Pani; Luigi Raffo; Simone Secchi

Massively Parallel Processors on-Chip, presenting the same problems of their non-monolithic counterparts, exacerbated by the limited on-chip resources, are the most challenging architectures in the processor architectures domain. In this paper, a novel nature-inspired decentralized algorithm, aiming at the definition of clusters of processors to be assigned to different threads, is presented and evaluated. Taking inspiration from liquid surface tension and drops coalescence, the proposed solution achieves better performances than other distributed solutions, reducing fragmentation and communication latency within the clusters.

international conference / workshop on embedded computer systems: architectures, modeling and simulation | 2008

A Novel Non-exclusive Dual-Mode Architecture for MPSoCs-Oriented Network on Chip Designs

Francesca Palumbo; Simone Secchi; Danilo Pani; Luigi Raffo

Multi-Processor Systems-on-Chip (MPSoCs) are the most recent challenge of the VLSI technologies and Networks on Chip represent a high performance alternative to the traditional bus architectures. In this paper, a novel approach to the design of a dual-mode router, based on the idea of supporting both circuit and packet switching in a non-exclusive way, is presented and evaluated. This feature makes the proposed architecture suitable for MPSoCs which have to deal with heterogeneous traffic characteristics especially in terms of data size, such as the Massively Parallel Processors. Non-exclusivity enables packets latency reduction, which in turn implies lower task completion times, and also it increases throughput.

IEEE Transactions on Parallel and Distributed Systems | 2012

Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

Oreste Villa; Antonino Tumeo; Simone Secchi; Joseph B. Manzano

Irregular applications, such as data mining or graph-based computations, show unpredictable memory/network access patterns and control structures. Massively multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2, and XMT, appear to address irregular application requirements better than commodity clusters. However, the research on massively multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy, and customization. At the same time, Shared Memory MultiProcessors (SMPs) with multicore processors have become an attractive platform to simulate large-scale systems. This paper introduces a cycle-level simulator of the massively multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques implemented to obtain high-simulation speed while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at runtime and includes a parametric network and memory model that takes into account contention and hot spotting. On a modern 48-core SMP host, the proposed infrastructure simulates a large set of irregular applications 500 to 2,000 times slower than real time when compared to a 128-processor XMT, with an accuracy error under 10 percent. Emulation is only from 25 to 200 times slower than real time. The paper also presents a case study, where the simulation infrastructure is used to identify bottlenecks in the current XMT architecture and to estimate the performance scaling of a possible multicore design with next generation memory and network interconnect.

digital systems design | 2008

A Network on Chip Architecture for Heterogeneous Traffic Support with Non-Exclusive Dual-Mode Switching

Simone Secchi; Francesca Palumbo; Danilo Pani; Luigi Raffo

As the multi-core processors era took place, several design concerns have risen. Interconnection layer efficiency has gained particular relevance as a crucial issue to be addressed in order to leverage the large amount of on-chip resources that todays VLSI technologies are able to provide. At the same time, as the architectural parallelism will continue to grow and become more fine-grained, the kind of traffic generated by the different multithreaded applications is turning out to be very wide-ranging in terms of size and burstiness. In order to adapt to this large variety of traffic to be supported, several models of dual-mode routers have been developed, implementing both packet switching and circuit switching techniques, thus supporting both best effort and guaranteed throughput services. This paper introduces an innovative model of non-exclusive dual-mode router, able to combine the aforementioned features in a non exclusive way (i.e.: in parallel inside the network on the same link). This feature makes this NoC architecture well-suited for multi-processor system on-chip (MPSoC) architectures with a high level of parallelism which have to deal with heterogeneous traffic conditions, such as massively parallel processors (MPPs) and processor arrays (PAs).

IEEE Transactions on Parallel and Distributed Systems | 2017

Exploring Efficient Hardware Support for Applications with Irregular Memory Patterns on Multinode Manycore Architectures

Marco Ceriani; Simone Secchi; Oreste Villa; Antonino Tumeo; Gianluca Palermo

With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming available foranalysis. Often the data collected have complex, graph based structures, which makes them difficult to process with traditional tools. Moreover, the irregularities in the data sets, and in the analysis algorithms, hamper the scaling of performance in large distributedhigh-performance systems, optimized for locality exploitation and regular data structures. In this paper we present an approach tosystem design that enable efficient execution of applications with irregular memory patterns on a distributed, many-core architecture, based on off-the-shelf cores. We introduce a set of hardware and software components, which provide a distributed global address space, fine-grained synchronization and latency hiding of remote accesses with multithreading. An FPGA prototype has been implemented to explore the design with a set of typical irregular kernels. We finally present an analytical model that highlights the benefits of the approach and helps identifying the bottlenecks in the prototype. The experimental evaluation on graph basedapplications demonstrates the scalability of the architecture for different configurations of the whole system.

computing frontiers | 2010

Self organization on a swarm computing fabric: a new way to look at fault tolerance

Danilo Pani; Simone Secchi; Luigi Raffo

Recent studies have demonstrated the possibility to exploit Swarm Intelligence (SI) as an inspiration for the design of scalable VLSI tiled architectures exhibiting multitasking, adaptability, absence of centralized low-level control and fault-tolerance. SI approach to fault-tolerance, in principle, can be regarded as a reconfiguration-free cell-exclusion mechanism. The key elements at the basis of a reconfiguration free solution are: loose structure of the system, homogeneity, cooperative behaviors and self organization. In this paper, these self organization aspects, introduced in a recently developed multi-agent VLSI tiled architecture for array processing, expressly developed resorting to the SI inspiration, are presented along with some theoretical and experimental results. The architecture presents two forms of cell-exclusion (bypass and block of faulty elements), implementing self-adaptive behaviors rather than reconfiguration to face faults preserving system functionality. The proposed approach, exploiting indirect communications to provide workload spreading into the computing fabric, is also successful in reducing the effects of the presence of faulty elements without spare resources and with limited performance degradation.

field-programmable custom computing machines | 2013

Exploring Manycore Multinode Systems for Irregular Applications with FPGA Prototyping

Marco Ceriani; Gianluca Palermo; Simone Secchi; Antonino Tumeo; Oreste Villa

We propose an intermediate approach between full custom hardware systems and full-software tools. Figure 1 shows the overview of the proposed architecture. We start from an off-the-shelf architecture composed of simple, in-order cores and an on-chip interconnection. The onchip interconnection interfaces the processing core with the memory controller for the external memory (DDR3) and the shared I/O peripherals. We add three custom components: the Global Memory Access Scheduler (GMAS), the Global Network Interface (GNI) and the Global SYNChronization module (GSYNC). The GMAS enables support for the scrambled address space. It also implements part of the support latency tolerance, storing remote memory operations, and acts as a scheduler for lightweight software multithreading.

international symposium on performance analysis of systems and software | 2010

Exploiting FPGAs for technology-aware system-level evaluation of multi-core architectures

Simone Secchi; Paolo Meloni; Luigi Raffo

The hardware-software co-development of modern complex MPSoC computing platforms exposes to the designer a huge complexity, resulting from the combination of vastly different architectural possibilities with strict demands posed by the target applications. To handle this complexity, highly accurate but rapid prototyping/evaluation environments need to be developed, that would possibly be able to provide an effective measurement of the system under design as soon as possible, allowing to comply with current time-to-market. While software-based fully cycle-accurate simulators do not seem to represent anymore an adequate solution to solve this issue, the attention has been recently shifted to the adoption of hardware emulators in the early stages of the design flow. In this work, we present an emulation framework for library-based semiautomatic instantiation of complex multi-core platforms that exploits FPGA devices to provide detailed functional information on the platform under development, and at the same time using hardware execution traces with technology-related analytical models to extract, already at system-level, physical metrics on power consumption, maximum operating frequency and area occupation of a prospective ASIC implementation of the system. Two prospective use case scenarios are presented to validate the usefulness of the presented framework: the first one analyzes the mapping and the scalability of a highly parallel application over a 2D homogeneous mesh architecture for increasing number of processors, while the second one employs the emulation infrastructure inside a design space exploration flow for the configuration of some interconnection network parameters.

Explore More