José Gabriel F. Coutinho

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where José Gabriel F. Coutinho is active.

Explore More

Publication

Featured researches published by José Gabriel F. Coutinho.

aspect-oriented software development | 2012

LARA: an aspect-oriented programming language for embedded systems

João M. P. Cardoso; Tiago Carvalho; José Gabriel F. Coutinho; Wayne Luk; Ricardo Nobre; Pedro C. Diniz; Zlatko Petrov

The development of applications for high-performance embedded systems is typically a long and error-prone process. In addition to the required functions, developers must consider various and often conflicting non-functional application requirements such as performance and energy efficiency. The complexity of this process is exacerbated by the multitude of target architectures and the associated retargetable mapping tools. This paper introduces an As-pect-Oriented Programming (AOP) approach that conveys domain knowledge and non-functional requirements to optimizers and mapping tools. We describe a novel AOP language, LARA, which allows the specification of compi-lation strategies to enable efficient generation of software code and hardware cores for alternative target architectures. We illustrate the use of LARA for code instrumentation and analysis, and for guiding the application of compiler and hardware synthesis optimizations. An important LARA feature is its capability to deal with different join points, action models, and attributes, and to generate an aspect intermediate representation. We present examples of our aspect-oriented hardware/software design flow for mapping real-life application codes to embedded platforms based on Field Programmable Gate Array (FPGA) technology.

field-programmable logic and applications | 2007

Automatic Accuracy-Guaranteed Bit-Width Optimization for Fixed and Floating-Point Systems

William George Osborne; Ray C. C. Cheung; José Gabriel F. Coutinho; Wayne Luk; Oskar Mencer

In this paper we present Minibit+, an approach that optimizes the bit-widths of fixed-point and floating-point designs, while guaranteeing accuracy. Our approach adopts different levels of analysis giving the designer the opportunity to terminate it at any stage to obtain a result. Range analysis is achieved using a combined affine and interval arithmetic approach to reduce the number of bits. Precision analysis involves a coarse-grain and fine-grain analysis. The best representation, in fixed-point or floating-point, for the numbers is then chosen based on the range, precision and latency. Three case studies are used: discrete cosine transform, B-Splines and RGB to YCbCr color conversion. Our analysis can run over 200 times faster than current approaches to this problem while producing more accurate results, on average within 2-3% of an exhaustive search.

symposium on cloud computing | 2009

A high-level compilation toolchain for heterogeneous systems

Wayne Luk; José Gabriel F. Coutinho; Tim Todman; Yuet Ming Lam; William George Osborne; Kong Woei Susanto; Qiang Liu; W. S. Wong

This paper describes Harmonic, a toolchain that targets multiprocessor heterogeneous systems comprising different types of processing elements such as general-purposed processors (GPPs), digital signal processors (DSP), and field-programmable gate arrays (FPGAs) from a high-level C program. The main goal of Harmonic is to improve an application by partitioning and optimising each part of the program, and selecting the most appropriate processing element in the system to execute each part. The core tools include a task transformation engine, a mapping selector, a data representation optimiser, and a hardware synthesiser. We also use the C language with source-annotations as intermediate representation for the toolchain, making it easier for users to understand and to control the compilation process.

field-programmable custom computing machines | 2005

Interleaving behavioral and cycle-accurate descriptions for reconfigurable hardware compilation

José Gabriel F. Coutinho; Jun Jiang; Wayne Luk

This paper describes Haydn, a hardware compilation approach which aims to combine the benefits of cycle accurate descriptions such as ease of control and performance, and the rapid development and design exploration facilities in behavioral synthesis tools. Our approach supports two main features: deriving architectures that meet performance goals involving metrics such as resource usage and execution time, and inferring design behavior by generating behavioral code that is easy to verify and modify from scheduled designs such as pipeline architectures. We report four recent developments that significantly enhance the Haydn approach: (a) a design methodology that supports both cycle-accurate and behavioral levels, in which developers can move from one level to the other: (b) an extended scheduling algorithm which supports operation chaining, pipelined resources (with different latencies and initiation intervals), forwarding technique for loop-carried dependencies, and resource sharing and control; (c) a hardware design flow that can be customized with a script language and extended simulation capabilities for the RC2000 board; and (d) an evaluation of our approach using various case studies, including 3D free-form deformation (FFD), Gouraud shading, Fibonacci series, Montgomery multiplication, and one-dimensional DCT. For instance, our approach has been used to produce various FFD designs in hardware automatically; the smallest at 137 MHz is 294 times faster than software on a dual AMD MP2600+ processor machine at 2.1 GHz, and is 2.7 times smaller and 10% slower than the fastest design at 153 MHz.

field-programmable logic and applications | 2009

Optimising designs by combining model-based and pattern-based transformations

Qiang Liu; Tim Todman; José Gabriel F. Coutinho; Wayne Luk; George A. Constantinides

We present a methodology for optimising designs written in high-level descriptions, combining mathematical model-based transformations with syntax-driven pattern-matching transformations, showing how the two kinds of transformation can benefit each other. We evaluate thismethodology by implementing an instance, combining a model-based transformation for data reuse with pattern-based transformations to improve its output. Results for three benchmarks show the implemented framework can improve system performance by up to 57 times.

southern conference programmable logic | 2008

Integrated Hardware/Software Codesign for Heterogeneous Computing Systems

Yuet Ming Lam; José Gabriel F. Coutinho; Wayne Luk

This paper describes a strategy that integrates the task mapping and task scheduling steps for heuristic search techniques, with multiple neighbourhood functions to reduce search time and enchance solution quality in developing heterogeneous computing systems. For case studies involving 40 randomly generated task graphs and the fast Fourier transform, experimental results show that our approach outperfroms previous approaches in terms of search time by up to 93 times, and solution quality by up to 22.6% for a system with a microprocessor, a floating-point digital signal processor, and an FPGA.

field-programmable technology | 2007

Instrumented Multi-Stage Word-Length Optimization

William George Osborne; José Gabriel F. Coutinho; Ray C. C. Cheung; Wayne Luk; Oskar Mencer

In this paper we present a tool, LengthFinder, for optimizing word-lengths of hardware designs with fixed-point arithmetic based on analytical error models that guarantee accuracy. LengthFinder adopts a multi-stage approach, with four novel features. First, the code analysis stage selects loops to instrument, such that information about the number of iterations can be extracted to generate more accurate results. Second, aggressive heuristics are used to produce non-uniform word-lengths rapidly while meeting requirements from the guaranteed error functions. Third, a method capable of reducing the search space has been developed for data-partitioning with a variable word-length reduction. Fourth, a genetic algorithm with selective-crossover and high mutation probability is applied to obtain near-optimal results. The benefits of LengthFinder are illustrated with various case studies. We show that LengthFinder can run over 200 times faster than previous techniques (Lee et al., 2006), while producing more accurate results, relative to values obtained from integer linear programming.

field-programmable logic and applications | 2008

Mapping and scheduling with task clustering for heterogeneous computing systems

Yuet Ming Lam; José Gabriel F. Coutinho; Wayne Luk; Philip Heng Wai Leong

This paper presents a new approach for mapping task graphs to heterogeneous hardware/software computing systems using heuristic search techniques. Two techniques: (1) integration of clustering, mapping, and scheduling in a single step and (2) multiple neighborhood functions strategy are proposed to enhance quality of mapping/scheduling solutions. Our approach is demonstrated by case studies involving 40 randomly generated task graphs, as well as four real applications including signal processing and pattern recognition. Experimental results show that the proposed integrated approach outperforms a separate approach in terms of quality of the mapping/scheduling solution by up to 18.3% for a heterogeneous system which includes a microprocessor, a floating-point digital signal processor, and an FPGA.

field-programmable technology | 2003

Source-directed transformations for hardware compilation

José Gabriel F. Coutinho; Wayne Luk

This paper presents the Haydn-C language and its parallel programming model. They have been developed to support modular hardware design, to improve designer productivity, and to enhance design quality and maintainability. The principal innovation of Haydn-C is a framework of optional annotations to enable users to describe design constraints, and to direct source-level transformations such as scheduling and resource allocation. We have automated such transformations so that a single high-level design can be used to produce many implementations with different design trade-offs. The effectiveness of this approach has been evaluated using various case studies, including FIR filters, fractal generators, and morphological operators. For instance, the fastest morphological erosion design is 129 times faster and 3.4 times larger than the smallest design.

compiler construction | 2015

Protocols by Default

Nicholas Ng; José Gabriel F. Coutinho; Nobuko Yoshida

This paper presents a code generation framework for type-safe and deadlock-free Message Passing Interface (MPI) programs. The code generation process starts with the definition of the global topology using a protocol specification language based on parameterised multiparty session types (MPST). An MPI parallel program backbone is automatically generated from the global specification. The backbone code can then be merged with the sequential code describing the application behaviour, resulting in a complete MPI program. This merging process is fully automated through the use of an aspect-oriented compilation approach. In this way, programmers only need to supply the intended communication protocol and provide sequential code to automatically obtain parallelised programs that are guaranteed free from communication mismatch, type errors or deadlocks. The code generation framework also integrates an optimisation method that overlaps communication and computation, and can derive not only representative parallel programs with common parallel patterns (such as ring and stencil), but also distributed applications from any MPST protocols. We show that our tool generates efficient and scalable MPI applications, and improves productivity of programmers. For instance, our benchmarks involving representative parallel and application-specific patterns speed up sequential execution by up to 31 times and reduce programming effort by an average of 39%.

Explore More