Guido Araujo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guido Araujo is active.

Explore More

Publication

Featured researches published by Guido Araujo.

International Journal of Parallel Programming | 2005

The ArchC architecture description language and tools

Rodolfo Azevedo; Sandro Rigo; Marcus Bartholomeu; Guido Araujo; Cristiano C. de Araujo; Edna Barros

This paper presents an architecture description language (ADL) called ArchC, which is an open-source SystemC-based language that is specialized for processor architecture description. Its main goal is to provide enough information, at the right level of abstraction, in order to allow users to explore and verify new architectures, by automatically generating software tools like simulators and co-verification interfaces. ArchC’s key features are a storage-based co-verification mechanism that automatically checks the consistency of a refined ArchC model against a reference (functional) description, memory hierarchy modeling capability, the possibility of integration with other SystemC IPs and the automatic generation of high-level SystemC simulators and assemblers. We have used ArchC to synthesize both functional and cycle-based simulators for the MIPS and Intel 8051 processors, as well as functional models of architectures like SPARC V8, TMS320C62x, XScale and PowerPC.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2005

Efficient datapath merging for partially reconfigurable architectures

Nahri Moreano; Edson Borin; Cid C. de Souza; Guido Araujo

Reconfigurable systems have been shown to achieve significant performance speedup through architectures that map the most time-consuming application kernel modules or inner loops to a reconfigurable datapath. As each portion of the application starts to execute, the system partially reconfigures the datapath so as to perform the corresponding computation. The reconfigurable datapath should have as few and simple hardware blocks and interconnections as possible, in order to reduce its cost, area, and reconfiguration overhead. To achieve that, hardware blocks and interconnections should be reused as much as possible across the application. We represent each piece of the application as a data-flow graph (DFG). The DFG merging process identifies similarities among the DFGs, and produces a single datapath that can be dynamically reconfigured and has a minimum area cost, when considering both hardware blocks and interconnections. In this paper we present a novel technique for the DFG merge problem, and we evaluate it using programs from the MediaBench benchmark. Our algorithm execution time approaches the fastest previous solution to this problem and produces datapaths with an average area reduction of 20%. When compared to the best known area solution, our approach produces datapaths with area costs equivalent to (and in many cases better than) it, while achieving impressive speedups.

symposium on computer architecture and high performance computing | 2004

ArchC: a systemC-based architecture description language

Sandro Rigo; Guido Araujo; Marcus Bartholomeu; Rodolfo Azevedo

This paper presents an architecture description language (ADL) called ArchC, which is an open-source SystemC-based language that is specialized for processor architecture description. Its main goal is to provide enough information, at the right level of abstraction, in order to allow users to explore and verify new architectures, by automatically generating software tools like simulators and co-verification interfaces. ArchCs key features are a storage-based co-verification mechanism that automatically checks the consistency of a refined ArchC model against a reference (functional) description, memory hierarchy modeling capability, the possibility of integration with other SystemC IPs and the automatic generation of high-level SystemC simulators. We have used ArchC to synthesize both functional and cycle-based simulators for the MIPS, Intel 8051 and SPARC V8 processors, as well as functional models of modern architectures like TMS320C62x, XScale and PowerPC.

symposium on code generation and optimization | 2006

Software-Based Transparent and Comprehensive Control-Flow Error Detection

Edson Borin; Cheng Wang; Youfeng Wu; Guido Araujo

Shrinking microprocessor feature size and growing transistor density may increase the soft-error rates to unacceptable levels in the near future. While reliable systems typically employ hardware techniques to address soft-errors, software-based techniques can provide a less expensive and more flexible alternative. This paper presents a control-flow error classification and proposes two new software-based comprehensive control-flow error detection techniques. The new techniques are better than the previous ones in the sense that they detect errors in all the branch-error categories. We implemented the techniques in our dynamic binary translator so that the techniques can be applied to existing x86 binaries transparently. We compared our new techniques with the previous ones and we show that our methods cover more errors while has similar performance overhead.

international symposium on systems synthesis | 1995

Optimal code generation for embedded memory non-homogeneous register architectures

Guido Araujo; Sharad Malik

Abstract: This paper examines the problem of code generation for expression trees on non-homogeneous register set architectures. It proposes and proves the optimality of an O(n) algorithm for the tasks of instruction selection, register allocation and scheduling on a class of architectures defined as the [1,/spl infin/] model. Optimality is guaranteed by sufficient conditions derived from the register transfer graph (RTG), a structural representation of the architecture which depends exclusively on the processor instruction set architecture (ISA). Experimental results using the TMS320C25 as the target processor show the efficacy of the approach.

design automation conference | 1996

Using register-transfer paths in code generation for heterogeneous memory-register architectures

Guido Araujo; Sharad Malik; Mike Tien-Chien Lee

In this paper we address the problem of code generation for basic blocks in heterogeneous memory-register DSP processors. We propose a new a technique, based on register-transfer paths, that can be used for efficiently dismantling basic block DAGs (Directed Acyclic Graphs) into expression trees. This approach builds on recent results which report optimal code generation algorithm for expression trees for these architectures. This technique has been implemented and experimentally validated for the TMS320C25, a popular fixed point DSP processor. The results show that good code quality can be obtained using the proposed technique. An analysis of the type of DAGs found in the DSPstone benchmark programs reveals that the majority of basic blocks in this benchmark set are expression trees and leaf DAGs. This leads to our claim that tree based algorithms, like the one described in this paper, should be the technique of choice for basic blocks code generation with heterogeneous memory register architectures.

international symposium on microarchitecture | 1998

Code compression based on operand factorization

Guido Araujo; Paulo Centoducatte; Mario Cartes; Ricardo Pannain

This paper proposes a code compression technique called operand factorization. The central idea of operand factorization is the separation of program expression trees into sequences of tree-patterns (opcodes) and operand patterns (registers and immediates). Using this technique, we show that tree and operand patterns have exponential frequency distributions. A set of experiments were designed to explore this feature. They reveal an average compression ratio of 43% for SPECInt95 programs. A decompression engine is proposed, which assembles tree and operand patterns into uncompressed instruction sequences. An encoding that improves the design of the decompression engine results in a 48% compression ratio. Compression ratio numbers take into consideration an estimate of the decompression engine size.

Archive | 1996

Code Generation and Optimization Techniques for Embedded Digital Signal Processors

Stan Y. Liao; Srinivas Devadas; Kurt Keutzer; Steve Tjiang; Albert R. Wang; Guido Araujo; Ashok Sudarsanam; Sharad Malik; Vojin Živojnović; Heinrich Meyr

The advent of 0.5μ processing that allows for the integration of 5 million transistors on a single integrated circuit has brought forth new challenges and opportunities in embedded-system design. This high level of integration makes it possible and desirable to integrate a processor core, a program ROM, and an ASIC together on a single IC. To justify the design costs of such an IC, these embedded-system designs must be sold in large volumes and, as a result, they are very cost-sensitive. The cost of an IC is most closely linked to its size, which is derived from the final circuit area. It is not unusual for the ROM that stores the program code to be the largest contributor to the area of such ICs. Thus the incremental value of using logic optimization to reduce the size of the ASIC is smaller because the ASIC takes up a relatively smaller percentage of the final circuit area. On the other hand, the potential for cost reduction through diminishing the size of the program ROM is great. There are also often strong real-time performance requirements on the final code; hence, there is a necessity for producing high-performance code as well.

international symposium on systems synthesis | 1996

Instruction set design and optimizations for address computation in DSP architectures

Guido Araujo; Ashok Sudarsanam; Sharad Malik

In this paper we investigate the problem of code generation for address computation for DSP processors. This work is divided into four parts. First, we propose a branch instruction design which can guarantee minimum overhead for programs that make use of implicit indirect addressing. Second, we give a formulation and propose a solution for the problem of allocating address registers (ARs) for array accesses within loop constructs. Third, we describe retargetable approaches for auto-increment (decrement) optimizations of pointer variables, and loop induction variables. Finally, we use a graph coloring technique to allocate physical ARs to the virtual ARs used in the previous phases. The results show that the combination of the above techniques considerably improves the final code quality for benchmark DSP programs.

symposium on integrated circuits and systems design | 2004

An automatic testbench generation tool for a SystemC functional verification methodology

Karina R. G. da Silva; Elmar U. K. Melcher; Guido Araujo; Valdiney Alves Pimenta

The advent of new 90 nm/130 nm VLSI technology and SoC design methodologies, has brought an explosive growth in the complexity of modern electronic circuits. As a result, functional verification has become the major bottleneck in any design flow. New methods are required that allow for easier, quicker and more reusable verification. In this paper we propose an automatic verification methodology approach that enables fast, transaction-level, coverage-driven, self-checking and random-constraint functional verification. Our approach uses the systemC verification library (SCV), to synthesize a tool capable of automatically generating testbench templates. A case study from a real MP3 design is used to show the effectiveness of our approach.

Explore More