Gustavo Girão
Universidade Federal do Rio Grande do Sul
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gustavo Girão.
symposium on integrated circuits and systems design | 2007
Gustavo Girão; Bruno Cruz de Oliveira; Rodrigo Martins Soares; Ivan Saraiva Silva
Cache coherency and cache consistency in NoC-based heterogeneous platforms are still open problems. Current works addressing platform design avoid this issue either by proposing cacheless implementations or using snoopy protocols over buses. This paper addresses the cache coherence problem in a NoC-based MPSoC platform, focusing the communication considering both the load overhead produced by the coherency mechanism and read/write response times. Simulations of applications written in C and compiled with GCC are presented. Simulations results indicate that the load is constant with the cache size for a given line size.
international conference on hardware/software codesign and system synthesis | 2012
Abbas BanaiyanMofrad; Gustavo Girão; Nikil D. Dutt
Advances in technology scaling, coupled with aggressive voltage scaling results in significant reliability challenges for emerging Chip Multiprocessor (CMP) platforms, where error-prone caches continue to dominate the chip area. Network-on-Chip (NoC) fabrics are increasingly used to manage the scalability of these CMPs. We present a novel fault-tolerant scheme for Last Level Cache (LLC) in CMP architectures that leverages the interconnection network to protect the LLC cache banks against permanent faults. During a LLC access to a faulty area, the network detects and corrects the faults, returning the fault-free data to the requesting core. By leveraging the NoC interconnection fabric, we can implement any cache fault-tolerant scheme in an efficient, modular, and scalable manner. We perform extensive design space exploration on NoC benchmarks to demonstrate the utility and efficacy of our approach. The overheads of leveraging the NoC fabric are minimal: on an 8-core, 16-cache-bank CMP we demonstrate reliable access to LLCs with additional overheads of less than 3% in area and less than 7% in power.
IEEE Transactions on Very Large Scale Integration Systems | 2009
Gustavo Girão; Daniel Barcelos; Flávio Rech Wagner
This paper presents a performance evaluation study on distinct memory hierarchies considering an NoC-based MPSoC environment. This evaluation considers two sets of experiments. The first one evaluates the performance and energy efficiency of four different memory hierarchies in a situation with no external traffic. In the second experiment, a traffic generator is responsible for the injection of synthetic traffic into the system, in order to increase the latency of the NoC and evaluate the performance of each memory model in this situation. Results show that, with no external traffic, the distributed memory presents better results for applications with low amount of data to be transferred. On the other hand, results suggest that shared and distributed shared memories present the best results for applications with high data transferring needs. In the second experiment, with external traffic, for applications with low communication bandwidth requirements, a memory organization that is physically centralized and logically shared is shown to have a smooth performance degradation when external traffic rises up to 20% of network capacity (22% decrease for an application demanding high communication, and 34% decrease for a low communication one). In contrast, a distributed memory model presents 2% of degradation in an application with high communication requirements, when traffic rises up to 20% of network capacity, and reaches 19% of degradation in low communication ones. Shared and distributed shared memory models are shown to present lower tolerance to high latencies.
symposium on integrated circuits and systems design | 2010
Leonardo Kunz; Gustavo Girão; Flávio Rech Wagner
Transactional memories have emerged in the last years as a new solution for synchronization on shared memory multiprocessors helping to exploit the parallelism of applications while overcoming limitations of the lock mechanism. This paper presents the performance and energy evaluation of a hardware transactional memory (HTM) solution in an NoC-based MPSoC environment, comparing it to a traditional shared memory model that uses locks to provide consistency. Experiments show that transactional memory is a promising alternative to locks for future NoC-based embedded systems, resulting in performance gains up to 30% and energy savings up to 32%, depending on the application and on the architecture configuration.
design, automation, and test in europe | 2013
Gustavo Girão; Thiago Santini; Flávio Rech Wagner
The dramatic increase in the number of processors, memories and other components in the same chip calls for resource-aware mechanisms to improve performance. This paper proposes four different resource mapping policies for NoC-based MPSoCs that leverage on distinct aspects of the parallel nature of the applications and on architecture constraints, such as off-chip memory latency. Results show that the use of these policies can improve performance up to 22.5% in average, and, in some cases, depending on the parallel programming model of each application, the improvement may reach up to 32%.
design, automation, and test in europe | 2013
Abbas BanaiyanMofrad; Nikil D. Dutt; Gustavo Girão
Advances in technology scaling increasingly make Network-on-Chips (NoCs) more susceptible to failures that cause various reliability challenges. With increasing area occupied by different on-chip memories, strategies for maintaining fault-tolerance of distributed on-chip memories become a major design challenge. We propose a system-level design methodology for scalable fault-tolerance of distributed on-chip memories in NoCs. We introduce a novel reliability clustering model for fault-tolerance analysis and shared redundancy management of on-chip memory blocks. We perform extensive design space exploration applying the proposed reliability clustering on a block-redundancy fault-tolerant scheme to evaluate the tradeoffs between reliability, performance, and overheads. Evaluations on a 64-core chip multiprocessor (CMP) with an 8x8 mesh NoC show that distinct strategies of our case study may yield up to 20% improvements in performance gains and 25% improvement in energy savings across different benchmarks, and uncover interesting design configurations.
ifip ieee international conference on very large scale integration | 2009
Gustavo Girão; Daniel Barcelos; Flávio Rech Wagner
This chapter presents a study on the performance and energy consumption arising from distinct memory organizations in an NoC-based MPSoC environment. This evaluation considers three sets of experiments. The first one evaluates the performance and energy efficiency of four different memory organizations in a situation where a single application is executed. In the second experiment, a traffic generator is responsible for the injection of synthetic traffic into the system, simulating the impact of the parallel execution of additional applications and increasing the latency of the NoC. Results show that, with a low NoC latency, the distributed memory presents better results for applications with low amount of data to be transferred. On the other hand, results suggest that shared and distributed shared memories present the best results for applications with high data transferring needs. In the second set of experiments, with higher NoC latency, for applications with low communication bandwidth requirements, a memory organization that is physically centralized and logically shared (called nDMA) is shown to have a smooth performance degradation when additional traffic rises up to 20% of the network capacity (22% degradation for an application demanding high communication, and 34% degradation for a low communication one). In contrast, a distributed memory model presents 2% of degradation in an application with high communication requirements, when traffic rises up to 20% of the network capacity, and reaches 19% of degradation in low communication ones. Shared and distributed shared memory models are shown to present lower tolerance to high latencies. A third set of experiments evaluates the performance of the four memory organization models in a situation of task migration, when a new application is launched and its tasks must be distributed among several nodes. Results show that the shared memory and distributed shared memory models have a better performance and energy savings than the distributed memory model in this situation. In addition, the nDMA memory model presents a smaller overhead when compared to the shared memory models and tends to reduce the traffic in the migration process due to the concentration of all memory modules in a single node of the network.
ACM Transactions in Embedded Computing Systems | 2014
Abbas BanaiyanMofrad; Gustavo Girão; Nikil D. Dutt
Advances in technology scaling increasingly make emerging Chip MultiProcessor (CMP) platforms more susceptible to failures that cause various reliability challenges. In such platforms, error-prone on-chip memories (caches) continue to dominate the chip area. Also, Network-on-Chip (NoC) fabrics are increasingly used to manage the scalability of these architectures. We present a novel solution for efficient implementation of fault-tolerant design of Last-Level Cache (LLC) in CMP architectures. The proposed approach leverages the interconnection network fabric to protect the LLC cache banks against permanent faults in an efficient and scalable way. During an LLC access to a faulty block, the network detects and corrects the faults, returning the fault-free data to the requesting core. Leveraging the NoC interconnection fabric, designers can implement any cache fault-tolerant scheme in an efficient, modular, and scalable manner for emerging multicore/manycore platforms. We propose four different policies for implementing a remapping-based fault-tolerant scheme leveraging the NoC fabric in different settings. The proposed policies enable design trade-offs between NoC traffic (packets sent through the network) and the intrinsic parallelism of these communication mechanisms, allowing designers to tune the system based on design constraints. We perform an extensive design space exploration on NoC benchmarks to demonstrate the usability and efficacy of our approach. In addition, we perform sensitivity analysis to observe the behavior of various policies in reaction to improvements in the NoC architecture. The overheads of leveraging the NoC fabric are minimal: on an 8-core, 16-cache-bank CMP we demonstrate reliable access to LLCs with additional overheads of less than 3% in area and less than 7% in power.
design, automation, and test in europe | 2011
Leonardo Kunz; Gustavo Girão; Flávio Rech Wagner
Transactional Memories (TM) have attracted much interest as an alternative to lock-based synchronization in shared-memory multiprocessors. Considering the use of TM on an embedded, NoC-based MPSoC, this work evaluates a LogTM implementation. It is shown that the time an aborted transaction waits before restarting its execution (the backoff delay) can seriously affect the overall performance and energy consumption of the system. This work also shows the difficulty to find a general and optimal solution to set this time and analyzes three backoff policies to handle it. A new solution to this issue is presented based on a handshake between transactions. Results suggest up to 20% in performance gains and up to 53% in energy savings when comparing our new solution to the best backoff delay alternative found in our experiments.
2017 International Conference on Computing Networking and Informatics (ICCNI) | 2017
Rubem Kalebe; Gustavo Girão; Itamir de Morais Barroca Filho
The evolving in Internet of Things and the plenty diversity of electronic components bring challenges and demand high level tools to improve the application development process. Concurrent computing is useful for catching the logic structure of a problem or solution and handling different and independent components, which is convenient in this context. Therefore, in this paper is presented a library for supporting a multithreading approach in microcontrollers, which is beneficial to improve the development process of new applications. The library brings benefits on development time, readability, writability and reliability of the code, and on software maintenance, besides helping with blocking tasks. Finally, is demonstrated how this solution is employed on a Smart Parking system.
Collaboration
Dive into the Gustavo Girão's collaboration.
Itamir de Morais Barroca Filho
Federal University of Rio Grande do Norte
View shared research outputs