Alexandre Solon Nery | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexandre Solon Nery is active.

Explore More

Publication

Featured researches published by Alexandre Solon Nery.

digital systems design | 2009

GridRT: A Massively Parallel Architecture for Ray-Tracing Using Uniform Grids

Alexandre Solon Nery; Nadia Nedjah; Felipe M. G. França

In this paper, we propose an architecture, which we call GridRT, capable of dealing with the main features, such as shadowsandreflectionseffects, of Ray Tracingused forrendering three-dimensional scenes. This architecture achieves an efficient overall performance yet using a simple and compact massively parallel design. The design exploits the usage of Xilinx R ! Floating Point Operator IP Core and the spatial data structure of Regular Grids.

International Journal of High Performance Systems Architecture | 2009

A massively parallel hardware architecture for ray-tracing

Alexandre Solon Nery; Nadia Nedjah; Felipe M. G. França

Real time performance of non-interactive rendering of three-dimensional scenes is usually unachievable. Ray tracing is one of the methods used for rendering such scenes. The performance achieved by a sequential software-based implementation of ray tracing is far from satisfactory. In contrast, many parallel implementations of ray tracing have been enabling real time performance, as the underlying algorithm can be massively parallelised. Thus, it is expected that a custom parallel design in hardware is likely to achieve the acceptable performance standards. In this paper, we propose a hardware architecture, which we call GridRT, capable of dealing with the main desirable features of ray tracing, such as shadows and reflections effects, imposing low requirements in terms of silicon area while achieving acceptable performance in terms of rendering time. This architecture achieves is efficient yet compact as it explores the massive parallelism offered by the intrinsic structure of the algorithm. The design exploits the usage the spatial data structure of regular grids.

international conference on algorithms and architectures for parallel processing | 2011

Massively parallel identification of intersection points for GPGPU ray tracing

Alexandre Solon Nery; Nadia Nedjah; Felipe M. G. França; Lech Józwiak

The latest advancements in computer graphics architectures, as the replacement of some fixed stages of the pipeline for programmable stages (shaders), have been enabling the development of parallel general purpose applications on massively parallel graphics architectures (Streaming Processors). For years the graphics processing unit (GPU) is being optimized for increasingly high throughput of massively parallel floating-point computations. However, only the applications that exhibit Data Level parallelism can achieve substantial acceleration in such architectures. In this paper we present a parallel implementation of the GridRT architecture for GPGPU ray tracing. Such architecture can expose two levels of parallelism in ray tracing: parallel ray processing and parallel intersection tests, respectively. We also present a traditional parallel implementation of ray tracing in GPGPU, for comparison against the GridRT-GPGPU implementation.

digital systems design | 2011

A Parallel Ray Tracing Architecture Suitable for Application-Specific Hardware and GPGPU Implementations

Alexandre Solon Nery; Nadia Nedjah; Felipe M. G. França; Lech Józwiak

The Ray Tracing rendering algorithm can produce high-fidelity images of 3-D scenes, including shadow effects, as well as reflections and transparencies. This is currently done at a processing speed of at most 30 frames per second. Therefore, actual implementations of the algorithm are not yet suitable for interactive real-time rendering, which is required in games and virtual reality based applications. Fortunately, the algorithm allows for massive parallelization of its computations. In this paper, we present a parallel architecture for ray tracing based on a uniform spatial subdivision of the scene and exploiting an embedded computation of ray-triangle intersections. This approach allows for a significant acceleration of intersection computations, as well as, a reduction of the total number of the required intersections checks. Furthermore, it allows for these checks to be performed in parallel and in advance for each ray. In this paper we discuss and analyze an ASIP-based implementation using FPGAs and a GPGPU-based parallel implementation of the proposed architecture. The performance of both implementations are reported and compared.

latin american symposium on circuits and systems | 2010

A parallel architecture for Ray-Tracing

Alexandre Solon Nery; Nadia Nedjah; Felipe M. G. França

Real time rendering of three-dimensional scenes in high photorealistic detail is a hard task, such as in the Ray Tracing rendering algorithm. However, parallel implementations of Ray Tracing have been enabling real time performance, as the algorithm is embarrassingly parallel. Thus, a custom parallel design in hardware is likely to achieve an acceptable performance. In this paper, we propose a hardware parallel architecture capable of dealing with the main desirable features of Ray Tracing, such as shadows and reflection effects, imposing low area cost and acceptable rendering performance.

latin american symposium on circuits and systems | 2014

Automatic complex instruction identification for efficient application mapping onto ASIPs

Alexandre Solon Nery; Nadia Nedjah; Felipe M. G. França; Lech Józwiak; Henk Corporaal

Instruction Set Customization is a well-known technique to enhance the performance and efficiency of Application-Specific Processors (ASIPs). An extensive application profiling can indicate which parts of a given application, or class of applications, are most frequently executed, enabling the implementation of such frequently executed parts in hardware as custom instructions. However, a naive ad hoc instruction set customization process may identify and select poor instruction extension candidates, which may not result in a significantly improved performance with low circuit-area and energy footprints. In this paper we propose and discuss an efficient instruction set customization method and automatic tool, which exploit the maximal common subgraphs (common operation patterns) of the most frequently executed basic blocks of a given application. The speed results from our tool for a VLIW ASIP are provided for a set of benchmark applications. The average execution time reduction ranges from 30% to 40%, with only a few custom instructions.

international conference on industrial informatics | 2014

A framework for automatic custom instruction identification on multi-issue ASIPs

Alexandre Solon Nery; Nadia Nedjah; Felipe M. G. França; Lech Józwiak; Henk Corporaal

Custom Instruction Identification is an important part in the design of efficient Application-Specific Processors (ASIPs). It consists of profiling of a given application to find patterns of basic operations that are frequently executed. Operations of such patterns can be implemented together as a single custom instruction to speedup the execution of the application. Because of the problems high complexity, several methods have been proposed for specific single-issue (RISC) processors and architectures, limiting the shape and size of custom instructions that can actually be identified and, possibly, implemented. In this paper, we propose and discuss an efficient custom instruction set identification method and corresponding automatic tool for multi-issue VLIW ASIPs, which search for the common operation patterns of the most frequently executed basic blocks of a given application, with different sizes and shapes. The speedup results for the custom instructions identified by our tool are provided for a set of benchmark applications. The speedup is up to 68%, with only a few custom instructions used.

Journal of Systems Architecture | 2013

Efficient hardware implementation of Ray Tracing based on an embedded software for intersection computation

Alexandre Solon Nery; Nadia Nedjah; Felipe M. G. França

Parallel implementations of Ray Tracing have been enabling real time performance, as the algorithm is embarrassingly parallel. However, in order to achieve both interactivity and real time performance, the algorithm should run at a high frame rates, i.e. at least 60 frames per second. Thus, a custom parallel design in hardware is likely to achieve high rendering performance. In this paper, we improve the GridRT architecture presented in previous work. GridRT is capable of dealing with the main desirable features of Ray Tracing, such as shadows and reflection effects, imposing low area cost and a promising rendering performance. As to this work, an application-specific instruction has been added and the underlaying computation embedded into the processors microprogram in order to calculate the ray-triangle intersection computations. These computations are performed in pipeline, whenever possible, yielding to a considerable reduction in terms of cycles per intersection test. The presented architecture is based on the uniform grid acceleration structure. It allows for a massive twofold parallelism: parallel ray-triangle intersection tests as well as parallel processing of many rays. A hardware implementation of the improved architecture is presented, together with the corresponding performance results and resources requirements. The rendering time is reduced by 80% using a grid configuration of eight processing elements and each intersection computation time is reduced by 50% with respect to the original GridRT implementation.

international symposium on circuits and systems | 2011

A parallel architecture for ray-tracing with an embedded intersection algorithm

Alexandre Solon Nery; Nadia Nedjah; Felipe M. G. França; Lech Józwiak

Real time rendering of three-dimensional scenes in Ray Tracing is a hard problem. However, parallel implementations have been enabling real time performance, as the algorithm can be highly parallelized. Thus, a custom parallel design in hardware is likely to achieve a good performance. In this paper, we further improve the GridRT architecture overall performance by embedding the ray-triangle intersection computation into the precessing elements that form the architecture. Low cost and high rendering performance are the main concerns in this novel design. The results show that the execution time of each intersection computation is reduced by at least 50%, while the area cost is practically unchanged or even reduced when compared to the original GridRT implementation.

digital systems design | 2011

Hardware Reuse in Modern Application-Specific Processors and Accelerators

Alexandre Solon Nery; Lech Józwiak; Menno Lindwer; Mauro Cocco; Nadia Nedjah; Felipe M. G. França

Effective exploitation of the application-specific parallel patterns and computation operations through their direct implementation in hardware is the base for construction of high-quality application-specific (re-)configurable application specific instruction set processors (ASIPs) and hardware accelerators for modern highly-demanding applications. Although it receives a lot of attention from the researchers and practitioners, a very important problem of hardware reuse in ASIP and accelerator synthesis is clearly underestimated and does not get enough attention in the published research. This paper is an effect of an industry and academic collaborative research. It analyses the problem of hardware sharing, shows its high practical relevance, as well as a big influence of hardware sharing on the major circuit and system parameters, and its importance for the multi-objective optimization and tradeoff exploitation. It also demonstrates that the state-of-the-art synthesis tools do not sufficiently address this problem and gives several guidelines related to enhancement of the hardware reuse.

Explore More