Is this you? Create Your Porfile

Júlio C. B. de Mattos

Universidade Federal do Rio Grande do Sul

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Júlio C. B. de Mattos is active.

Explore More

Publication

Featured researches published by Júlio C. B. de Mattos.

symposium on integrated circuits and systems design | 2003

CACO-PS: a general purpose cycle-accurate configurable power simulator

Antonio C. S. Beck Filho; Júlio C. B. de Mattos; Flávio Rech Wagner; Luigi Carro

This paper presents a cycle-accurate and configurable simulator that estimates the power consumed by an embedded system. The simulator accepts as input a structural system architecture description, at a level of abstraction that can be configured by the user. The simulator has been used to study the power dissipation of different micro-architectures of a Java microcontroller while executing several applications. Thanks to the cycle-driven behavior and the structural system description, the simulator is both flexible and accurate, and it may also be fast, depending on the desired level of abstraction. It represents a valuable support in a design space exploration methodology, allowing a power consumption evaluation that can be applied to any processor and system architecture at an early design stage.

international conference on multimedia and expo | 2012

Motion Vectors Merging: Low Complexity Prediction Unit Decision Heuristic for the Inter-prediction of HEVC Encoders

Felipe Sampaio; Sergio Bampi; Mateus Grellert; Luciano Volcan Agostini; Júlio C. B. de Mattos

This paper presents the Motion Vectors Merging (MVM) heuristic, which is a method to reduce the HEVC inter-prediction complexity targeting the PU partition size decision. In the HM test model of the emerging HEVC standard, computational complexity is mostly concentrated in the inter-frame prediction step (up to 96% of the total encoder execution time, considering common test conditions). The goal of this work is to avoid several Motion Estimation (ME) calls during the PU inter-prediction decision in order to reduce the execution time in the overall encoding process. The MVM algorithm is based on merging NxN PU partitions in order to compose larger ones. After the best PU partition is decided, ME is called to produce the best possible rate-distortion results for the selected partitions. The proposed method was implemented in the HM test model version 3.4 and provides an execution time reduction of up to 34% with insignificant rate-distortion losses (0.08 dB drop and 1.9% bitrate increase in the worst case). Besides, there is no related work in the literature that proposes PU-level decision optimizations. When compared with works that target CU-level fast decision methods, the MVM shows itself competitive, achieving results as good as those works.

southern conference programmable logic | 2012

Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard

Ricardo Jeske; José Cláudio de Souza; Gustavo Wrege; Ruhan Conceição; Mateus Grellert; Júlio C. B. de Mattos; Luciano Volcan Agostini

This paper presents the hardware design of a 16-points 1-D DCT used in the emerging video coding standard HEVC - High Efficiency Video Coding. The 1-D DCT is used by the 16×16 2-D DCT of the HEVC standard. The transforms stage is one of the innovations proposed by HEVC, not only because of the variable size (from 4×4 to 32×32) but also because higher dimension transforms other than the traditional 4×4 and 8×8 are used. The hardware design presented in this work focuses on low cost and high throughput. To achieve such objectives, the 16-points algorithm from HEVC was simplified, so that a more efficient hardware design could be implemented. Some strategies were used during this simplification, such as operations reordering, factoring to compress the length of the operators, multiplications by constant turned into shifts and adds, sub-expressions sharing, among others. The architecture was designed in a fully combinational way in order to reduce hardware overhead. Synthesis results obtained using Altera FPGAs from the Cyclone II and Stratix III families showed hardware resources reduction reaching 72% when compared to an architecture described as a direct transcription of the non-optimized version of the algorithm. Even with a purely combinational implementation, the designed architecture achieved a throughput between 376 Msamples/s and 1.4 Gsamples/s. With these results, the architecture is capable of processing, in the worst case, more than 30 QFHD frames (3840×2160 pixels) per second. Therefore, the architecture is capable of processing videos with significantly high resolutions in real time. To the best of our knowledge, this is the first work in the literature that presents hardware results for the HEVC transforms.

international conference on image processing | 2013

An adaptive workload management scheme for HEVC encoding

Mateus Grellert; Muhammad Shafique; Muhammad Usman Karim Khan; Luciano Volcan Agostini; Júlio C. B. de Mattos; Jörg Henkel

Managing the complexity of the emerging HEVC standard is a matter of academic and industrial research since its earlier versions. The sophisticated and computation-intensive tools involved in the encoding process must be leveraged if real-time applications are considered. In this paper, we propose a workload management scheme for dynamically controlling the computational complexity of HEVC, under user-defined operation frequency and target FPS. Our scheme receives these two parameters as input and aims to meet the target FPS by adjusting different encoding parameters during execution time. Experiments demonstrate that our scheme successfully meets the target FPS while introducing negligible rate-distortion losses. A comparison with state-of-the-art shows that our scheme is capable of achieving a time reduction of up to 43% for Full HD sequences, with a maximum loss of 0.03 dB in Y-PSNR and a 3.5% increase in bitrate.

IFIP Working Conference on Distributed and Parallel Embedded Systems | 2004

Design Space Exploration with Automatic Generation of IP-Based Embedded Software

Júlio C. B. de Mattos; Lisane B. de Brisolara; Renato Fernandes Hentschke; Luigi Carro; Flávio Rech Wagner

Automatic embedded software generation and IP-based design are good approaches to achieve a short design cycle due to stringent time-to-market requirements. But design automation must also consider application-specific requirements. This paper presents a mechanism for the automatic selection of software IP components for embedded applications, which is based on a software IP library and a design space exploration tool. The software IP library has different algorithmic implementations of several routines commonly found in different application domains. These routines have been characterized in terms of power, performance, and area, for a given architectural platform. The design exploration tool allows the automatic configuration of an optimized solution for a specific application, by selecting routines whose combination best match system requirements. Experimental results are presented and demonstrate that a very expressive design space can be explored with this approach.

symposium on integrated circuits and systems design | 2005

Making object oriented efficient for embedded system applications

Júlio C. B. de Mattos; Emilena Specht; Bruno Neves; Luigi Carro

Nowadays, with the growing complexity of embedded systems, it is necessary to use techniques and methodologies that in the same time increase the software productivity, while being still able to manipulate physical embedded systems constraints like memory footprint, real-time behavior, power dissipation and so on. Object-oriented modeling and design is a widely-known methodology in software engineering. This paradigm may satisfy the software portability and maintainability requirements, but it presents an overhead in terms of memory, performance and code size. This paper proposes a pragmatic approach, consisting in the use design space exploration tool (DESEJOS Tool) to allow an automatic selection of the best object organization. This tool tries to transform, in an automatic way, as many dynamic objects to static ones, in the goal to reduce execution time, while maintaining memory costs as low as possible. Experimental results are presented and demonstrate that a one can increase by 24% the performance of an MP3 player while paying just 0.3% increase in memory size

international symposium on circuits and systems | 2011

A multilevel data reuse scheme for Motion Estimation and its VLSI design

Mateus Grellert; Felipe Sampaio; Júlio C. B. de Mattos; Luciano Volcan Agostini

Motion Estimation (ME) in video coding is a vital component that excels not only in computational complexity, but off-chip memory bandwidth as well. These two issues are considered critical constraints in terms of High Definition (HD) video coding, since a large volume of data must be processed. The multilevel data reuse scheme proposed in this paper is able to reduce the off-chip memory bandwidth, with direct impact in throughput and energy consumption. This scheme explores the concept of overlapped Search Windows (SW) in more than one level and poses no harm to video quality. Comparisons with related works show that this solution provides the best tradeoff between the use of on-chip memory and reduction of the off-chip memory bandwidth. The data reuse scheme was applied in a ME architecture and the synthesis results show that this solution presented the lowest use of hardware resources and the highest operation frequency among related works. The proposed architecture is able to process 1080p videos at 25 fps, and the reduction ratio of off-chip memory access achieved by the architecture is greater than 95% when compared to the traditional method.

IFIP Working Conference on Distributed and Parallel Embedded Systems | 2008

On the Use of Software Quality Metrics to Improve Physical Properties of Embedded Systems

Ricardo Miotto Redin; Marcio F. da S. Oliveira; Lisane B. de Brisolara; Júlio C. B. de Mattos; Luís C. Lamb; Flávio Rech Wagner; Luigi Carro

As software production achieves a growing importance in the embedded systems world, quality evaluation of embedded software and its impact on physical properties of embedded systems becomes increasingly relevant. Although there are tools for embedded software design that improve software specification and verification, we are still short of a tool that supports the designers decisions on the best design strategy regarding low level, physical characteristics like performance, energy, and memory footprint, which are critical in the embedded domain. In this paper, we provide an analysis of the correlation between software quality metrics and physical metrics for embedded software. By means of experiments, we investigate the impact of software engineering best practices on embedded software and show that software quality metrics can be used to guide design decisions toward improving physical properties of embedded systems.

symposium on integrated circuits and systems design | 2007

Object and method exploration for embedded systems applications

Júlio C. B. de Mattos; Luigi Carro

The growing complexity of embedded systems has claimed for more software solutions in order to reduce the time-to-market. However, while this software decrease the development time of the embedded system functionalities , it must at the same time help the handling of the embedded systems tight constraints, like energy, power and memory availability. Object orientation is now a common technique to write maintainable code, but its application to the embedded systems domain is withhold by the overhead in terms of memory, performance and code size. This paper introduces a methodology to explore object-oriented embedded software improving different levels in the software design, while dealing with different embedded systems requirements (power, memory area and performance). The proposed approach transforms the original OO code into an optimized code allowing the automatic configuration of a solution for a specific application. Experimental results with an MP3 player show a large design space exploration with different solutions in terms of performance, energy and memory.

symposium on integrated circuits and systems design | 2013

Hardware design for the 32×32 IDCT of the HEVC video coding standard

Ruhan Conceicao; J. Claudio de Souza; Ricardo Jeske; Marcelo Schiavon Porto; Júlio C. B. de Mattos; Luciano Volcan Agostini

This paper is focused in the inverse transforms defined in the video coding standard HEVC - High Efficiency Video Coding. The transforms stage is one of the innovations proposed by HEVC since it allows the use of the biggest number of transforms sizes (four) and also the biggest transform sizes (till 32×32) when compared with previous standards. The inverse DCT is performed by the video encoder and decoder as well. This paper presents an efficient hardware design for the 32×32 HEVC IDCT based on the separability principle. The hardware design was planned to reach real time processing (at least 30 frames per second) for high resolution videos, exploiting a high parallelism level (32 samples consumed per clock cycle). The architecture was also planned to reach a low latency and a low cost, then it was designed in a purely combinational way and using a multiplierless approach. The synthesis process was targeted to an Altera Stratix IV FPGA. The synthesis results show that the designed architecture is capable to process more than 30 QFHD frames (3840×2160 pixels) per second, with a latency of 33 clock cycles.

Explore More