Carlo Galuzzi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carlo Galuzzi is active.

Explore More

Publication

Featured researches published by Carlo Galuzzi.

ACM Transactions on Reconfigurable Technology and Systems | 2011

The Instruction-Set Extension Problem: A Survey

Carlo Galuzzi; Koen Bertels

The extension of a given instruction-set with specialized instructions has become a common technique used to speed up the execution of applications. By identifying computationally intensive portions of an application to be partitioned in segments of code to execute in software and segments of code to execute in hardware, the execution of an application can be considerably speeded up. Each segment of code implemented in hardware can then be seen as a specialized application-specific instruction extending a given instruction-set. Although a number of approaches exist in literature proposing different methodologies to customize an instruction-set, the description of the problem consists only of sporadic comparisons limited to isolated problems. This survey presents a unique detailed description of the problem and provides an exhaustive overview of the research in the past years in instruction-set extension. This article presents a thorough analysis of the issues involved during the customization of an instruction-set by means of a set of specialized application-specific instructions. The investigation of the problem covers both instruction generation and instruction selection and different kinds of customizations are analyzed in a great detail.

international conference on hardware/software codesign and system synthesis | 2006

Automatic selection of application-specific instruction-set extensions

Carlo Galuzzi; Elena Moscu Panainte; Yana Yankova; Koen Bertels; Stamatis Vassiliadis

In this paper, we present a general and an efficient algorithm for automatic selection of new application-specific instructions under hardware resources constraints. The instruction selection is formulated as an ILP problem and efficient solvers can be used for finding the optimal solution. An important feature of our algorithm is that it is not restricted to basic-block level nor does it impose any limitation on the number of the newly added instructions or on the number of the inputs/outputs of these instructions. The presented results show that a significant overall application speedup is achieved even for large kernels (for ADPCM decoder the speedup ranges from times1.2 to times3.7) and that our algorithm compares well with other state-of-art algorithms for automatic instruction set extensions.

applied reconfigurable computing | 2010

QUAD: a memory access pattern analyser

S. Arash Ostadzadeh; Roel Meeuws; Carlo Galuzzi; Koen Bertels

In this paper, we present the Quantitative Usage Analysis of Data (QUAD) tool, a sophisticated memory access tracing tool that provides a comprehensive quantitative analysis of memory access patterns of an application with the primary goal of detecting actual data dependencies at function-level. As improvements in processing performance continue to outpace improvements in memory performance, tools to understand memory access behaviors are inevitably vital for optimizing the execution of data-intensive applications on heterogeneous architectures. The tool, first in its kind, is described in detail and the benefit and the qualities of the presented tool are described on a real case study, the x264 benchmarking application.

International Journal of Electronics | 2008

A linear complexity algorithm for the automatic generation of convex multiple input multiple output instructions

Carlo Galuzzi; Koen Bertels; Stamatis Vassiliadis

The instruction-set extensions problem has been one of the major topics in recent years and it consists of the addition of a set of new complex instructions to a given instruction-set. This problem in its general formulation requires an exhaustive search of the design space to identify the candidate instructions. This search turns into an exponential complexity of the solution. In this paper we propose an efficient linear complexity algorithm for the automatic generation of convex multiple input multiple output instructions, whose convexity is theoretically guaranteed. The proposed approach is not restricted to basic-block level and does not impose limitations either on the number of input and/or output, or on the number of new instructions generated. Our results show a significant overall application speedup (up to ×2.9 for ADPCM decoder) considering the linear complexity of the proposed solution and which therefore compares well with other state-of-art algorithms for automatic instruction-set extensions.

international conference on embedded computer systems architectures modeling and simulation | 2007

A linear complexity algorithm for the generation of multiple input single output instructions of variable size

Carlo Galuzzi; Koen Bertels; Stamatis Vassiliadis

The Instruction-Set extension problem has been one of the major topics in the last years and it is the addition of a set of new complex instructions to a given Instruction-Set. This problem in its general formulation requires an exhaustive search of the design space to identify the candidate instructions. This search turns into an exponential complexity of the solution. In this paper we propose an algorithm for the generation of Multiple Input Single Output instructions of variable size which can be directly selected or combined for Instruction-Set extension. Additionally, the algorithm is suitable for inclusion in a design flow for automatic generation of MIMO instructions. The proposed algorithm is not restricted to basic-block level and has linear complexity with the number of processed elements.

international conference on embedded computer systems: architectures, modeling, and simulation | 2011

High level quantitative hardware prediction modeling using statistical methods

Roel Meeuws; Carlo Galuzzi; Koen Bertels

With the increasing proliferation of heterogeneous and reconfigurable computing, it has become essential to have efficient prediction models to drive early HW-SW partitioning and co-design. In this paper, we present a high level quantitative prediction modeling approach that accurately models the relation between hardware and software metrics, based on several statistical techniques. The proposed approach generates models that predict hardware performance indicators for reconfigurable components, such as the number of slices, the number of flip-flops, and the number of wires. It utilizes automatic model selection, artificial neural networks, (logistic) regression, and data transformations. These models take a high-level language description as input, enabling hardware prediction in the early design stages. We calibrate the models for two sets of tools targeting Xilinx and Altera FPGAs, where we report, for example, and error of 14% for the number of multipliers in case of Xilinx and an error of only 18% for the number of wires in case of Altera. To provide a realistic evaluation, we validate the approach using 181 kernels, contrary to the majority of the existing techniques, which use libraries of tens of kernels at most.

design, automation, and test in europe | 2009

Algorithms for the automatic extension of an instruction-set

Carlo Galuzzi; Dimitris Theodoropoulos; Roel Meeuws; Koen Bertels

In this paper, two general algorithms for the automatic generation of instruction-set extensions are presented. The basic instruction set of a reconfigurable architecture is specialized with new application-specific instructions. The paper proposes two methods for the generation of convex multiple input multiple output instructions, under hardware resource constraints, based on a two-step clustering process. Initially, the application is partitioned in single-output instructions of variable size and then, selected clusters are combined in convex multiple output clusters following different policies. Our results on well-known kernels show that the extended instructions-set allows to execute applications more efficiently and needing fewer cycles. Our results show that a significant overall application speed-up is achieved even for large kernels (for ADPCM decoder the speed-up is up to x2.2 and for TWOFISH encoder the speedup is up to x5.5).

international conference on parallel processing | 2010

tQUAD - Memory Bandwidth Usage Analysis

S. Arash Ostadzadeh; Marco Corina; Carlo Galuzzi; Koen Bertels

One of the main issues in heterogeneous reconfigurable computing is the well-known processor/memory bottleneck. Due to the memory bandwidth limitations, the performance of execution of an application can dramatically increase via the efficient usage of the memory. In this paper, we present tQUAD, a new tool for the memory bandwidth usage analysis. This tool is capable of delivering detailed temporal memory bandwidth usage information for the functions in an application throughout a comprehensive analysis of the memory access patterns of individual functions. This tool, first in its kind, provides an accurate analysis of the task execution and memory bandwidth usage which in the end leads to a sophisticated partitioning of the tasks into different phases during the execution span of an application. Together with an accurate description of the tool, the paper presents a real case study from the multimedia domain to detail all features of the proposed tool.

applied reconfigurable computing | 2008

A Framework for the Automatic Generation of Instruction-Set Extensions for Reconfigurable Architectures

Carlo Galuzzi; Koen Bertels

In this paper we present a framework for the automatic identification and selection of convex MIMO instruction-set extensions for reconfigurable architecture. The framework partitions the analysis of the problem into phases of different computational complexity and it generates instruction-set extensions of different granularity. The framework is retargetable and additional clustering policies can be added with just small modification on the design.

field-programmable technology | 2007

The Spiral Search: A Linear Complexity Algorithm for the Generation of Convex MIMO Instruction-Set Extensions

Carlo Galuzzi; Koen Bertels; Stamatis Vassiliadis

The instruction-set extension problem has been one of the major topics in the last decade and it consists of the addition of a set of new complex instructions to a given instruction-set. This problem in its general formulation requires an exhaustive search of the design space to identify the candidate instructions. A tradeoff between complexity and quality of the solution can be achieved limiting this search to implementable instructions. In this paper we propose a linear complexity algorithm for the generation of convex multiple input multiple output (MIMO) instructions of variable size based on the notion of spiral. Convex implementable MIMO clusters of instructions are identified by means of a spiral search through the levels of a graph. These new instructions can be directly selected or combined for more complex instruction-set extensions. An important feature of our algorithm is that it is neither restricted to basic-block level nor it imposes any limitation on the number of the newly instructions nor on the number of the inputs/outputs of these instructions.

Explore More