Davide Rizzo
STMicroelectronics
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Davide Rizzo.
compilers, architecture, and synthesis for embedded systems | 2003
Osvaldo Colavin; Davide Rizzo
Clustered VLIW architectures have been widely adopted in modern embedded multimedia applications for their ability to exploit high degrees of ILP with reasonable trade-off in complexity and silicon costs. Studies have however shown limited performance scaling for wide-issue machines. In this paper we describe the architecture of a clustered VLIW with a runtime reconfigurable inter-cluster bus suitable to address such scalability problem. The architecture is aimed at kernel loops acceleration through a coprocessor approach and allows a customization of the interconnect between neighboring register files before each loop execution. We have adopted an inter-cluster communication mechanism based on a constant-complexity interconnect. The complexity and latency independent of the number of clusters preserve the scalability on issue-width. To handle the limited connectivity, the interconnection resources in the inter-cluster bus are exposed to the compiler, and scheduled like other resources with an adapted version of modulo scheduling. Other relevant features include the capability to define shifting queues in the register files, for a more effective software pipelining support. The addition of a limited amount of reconfigurability to the well established VLIW programming model results in low-overhead inter-cluster communications and a scalable ILP architecture. Simulation results show that we can achieve near linear scalability for certain classes of kernel loops.
design, automation, and test in europe | 2002
Davide Rizzo; Osvaldo Colavin
In this paper, we investigate the benefits of a flexible, application-specific instruction set by adding a run-time Reconfigurable Functional Unit (RFU) to a VLIW processor. Preliminary results on the motion estimation stage in an MPEG4 video encoder are presented. With the RFU modeled at functional level and under realistic assumptions on execution latency, technology scaling and reconfiguration penalty, we explore different RFU instructions at fine-grain (instruction-level) and coarse-grain (loop-level) granularity to speedup the application execution. The memory bandwidth bottleneck, typical for streaming applications, is alleviated through the combined adoption of custom prefetch pattern instructions and an extent of local memory. Performance evaluations indicate that up to an 8/spl times/ improvement with loop-level optimizations can be achieved under various architectural assumptions.
Archive | 2003
Osvaldo Colavin; Davide Rizzo
Archive | 2002
Osvaldo Colavin; Davide Rizzo
Archive | 2004
Osvaldo Colavin; Davide Rizzo; Vineet Soni
Archive | 2006
Osvaldo Colavin; Davide Rizzo
Archive | 2002
Davide Rizzo; Osvaldo Colavin
Archive | 2002
Davide Rizzo; Osvaldo Colavin
embedded systems for real-time multimedia | 2003
Davide Rizzo; Osvaldo Colavin; Shiva Navab
Archive | 2006
Osvaldo Colavin; Davide Rizzo