Is this you? Create Your Porfile

Mateus B. Rutzig

Universidade Federal de Santa Maria

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mateus B. Rutzig is active.

Explore More

Publication

Featured researches published by Mateus B. Rutzig.

design, automation, and test in europe | 2008

Transparent reconfigurable acceleration for heterogeneous embedded applications

Antonio Carlos Schneider Beck; Mateus B. Rutzig; Georgi Gaydadjiev; Luigi Carro

Embedded systems are becoming increasingly complex. Besides the additional processing capabilities, they are characterized by high diversity of computational models coexisting in a single device. Although reconfigurable architectures have already shown to be a potential solution for such systems, they just present significant speedups of very specific dataflow oriented kernels. Furthermore, reconfigurable fabric is still withheld by the need of special tools and compilers, clearly not sustaining backward software compatibility. In this paper, we propose a new technique to optimize both dataflow and control-flow oriented code in a totally transparent process, without the need of any modification in the source or binary codes. For that, we have developed a Binary Translation algorithm implemented in hardware, which works in parallel to a MIPS processor. The proposed mechanism is responsible for transforming sequences of instructions at runtime to be executed on a dynamic coarse-grain reconfigurable array, supporting speculative execution. Executing the MIBench suite, we show performance improvements of up to 2.5 times, while reducing 1.7 times the required energy, using trivial hardware resources.

applied reconfigurable computing | 2008

Run-Time Adaptable Architectures for Heterogeneous Behavior Embedded Systems

Antonio Carlos Schneider Beck; Mateus B. Rutzig; Georgi Gaydadjiev; Luigi Carro

As embedded applications are getting more complex, they are also demanding highly diverse computational capabilities. The majority of all previously proposed reconfigurable architectures targets static data stream oriented applications, optimizing very specific computational kernels, corresponding to the typical embedded systems characteristics in the past. Modern embedded devices, however, impose totally new requirements. They are expected to support a wide variety of programs on a single platform. Besides getting more heterogeneous, these applications have very distinct behaviors. In this paper we explore this trend in more detail. First, we present a study about the behavioral difference of embedded applications based on the Mibench benchmark suite. Thereafter, we analyze the potential optimizations and constraints for two different run-time dynamic reconfigurable architectures with distinct programmability strategies: a fine-grain FPGA based accelerator and a coarse-grain array composed by ordinary functional units. Finally, we demonstrate that reconfigurable systems that are focused to single data stream behavior may not suffice anymore.

international parallel and distributed processing symposium | 2009

A low cost and adaptable routing network for reconfigurable systems

Ricardo S. Ferreira; Marcone Laure; Antonio Carlos Schneider Beck; Thiago Berticelli Lo; Mateus B. Rutzig; Luigi Carro

Nowadays, scalability, parallelism and fault-tolerance are key features to take advantage of last silicon technology advances, and that is why reconfigurable architectures are in the spotlight. However, one of the major problems in designing reconfigurable and parallel processing elements concerns the design of a cost-effective interconnection network. This way, considering that Multistage Interconnection Network (MIN) has been successfully used in several computer system levels and applications in the past, in this work we propose the use of a MIN, at the word level, on a coarse-grained reconfigurable architecture. More precisely, this work presents a novel parallel self-placement and routing mechanism for MIN on the circuit-switching mode. We take into account one-to-one as well as multicast (one-to-many) permutations. Our approach is scalable and it is targeted to be used in run-time environments where dynamic routing among functional units is required. In addition, our algorithm is embedded in the switch structure, and it is independent of the interstage interconnection pattern. Our approach can handle blocking and non-blocking networks, symmetrical or asymmetrical topologies. As case study, we use the proposed technique in a dynamic reconfigurable system, showing a major area reduction of 30% without performance overhead.

Microprocessors and Microsystems | 2014

A transparent and adaptive reconfigurable system

Antonio Carlos Schneider Beck; Mateus B. Rutzig; Luigi Carro

In the current scenario, where computer systems are characterized by a high diversity of applications coexisting in a single device, and with the stagnation in frequency scaling because of the excessive power dissipation, reconfigurable systems have already proven to be very effective. However, they all present two major drawbacks, which are addressed by this work: lack of transparency (the need for special tools or compilers that changes the original code) and no ability to adapt to applications with different behaviors and characteristics, so significant gains are achieved only in very specific data stream oriented applications. Therefore, this work proposes the Dynamic Instruction Merging (DIM), a Binary Translation mechanism responsible for transforming sequences of instructions into a coarse-grained array configuration at run-time, in a totally transparent process, with support to speculative execution. The proposed system does not impose any kind of modification to the source or binary codes, so full binary compatibility is maintained. Moreover, it can optimize any application, even those that do not present specific kernels for optimization. DIM presents, on average, 2.7 times of performance gains and 2.35 times of energy savings over a MIPS processor, and a higher IPC than an out-of-order superscalar processor, running the MIBench benchmark set.

ieee international symposium on parallel distributed processing workshops and phd forum | 2010

A low-energy approach for context memory in reconfigurable systems

Thiago Berticelli Lo; Antonio Carlos Schneider Beck; Mateus B. Rutzig; Luigi Carro

In most of the works concerning reconfigurable computing, the main objective is system optimization by taking into account the known requirements of a project, such as speedup, energy or area. However, as it will be shown in this paper, although very significant, the impact of the context memory is often ignored. Since the context memory is responsible for keeping configurations of the reconfigurable unit, the word size and hence the number of output bits is orders of magnitude larger than the regular memories, considerably increasing the energy consumption and area occupation. Therefore, in this article we propose a technique to handle these issues, while maintaining system performance. Using as case study a coarse-grain architecture tightly coupled to the MIPS R3000 processor, we show that the context memory can represent up to 63% of the total system energy and, by using the proposed approach, it is possible to save 59% of this amount, without any performance penalties.

applied reconfigurable computing | 2009

Dynamically Adapted Low Power ASIPs

Mateus B. Rutzig; Antonio Carlos Schneider Beck; Luigi Carro

New generations of embedded devices, following the trend found in personal computers, are becoming computationally powerful. A current embedded scenario presents a large amount of complex and heterogeneous functionalities, which have been forcing designers to create novel solutions to increase the performance of embedded processors while, at the same time, maintain power dissipation as low as possible. Former embedded devices could have been designed to execute a defined application set. Nowadays, in the new generation of these devices, some applications are unknown at design time. For example, in portable phones, the client is able to download new applications during the product lifetime. Hence, traditional designs can fail to deliver the required performance while executing an application behavior that has not been previously defined. On the other hand, reconfigurable architectures appear to be a possible solution to increase the processor performance, but their employment in embedded devices faces two main design constraints: power and area. In this work, we propose an ASIP reconfigurable development flow that aggregates design area optimization and a run-time technique that reduces energy consumption. The coupling of both methods builds an area optimized reconfigurable architecture to provide a high-performance and energy-efficient execution of a defined application set. Moreover, thanks to the adaptability provided by the reconfigurable ASIP approach, the execution of new application not foreseen at design time still shows high speedups rates with low energy consumption.

design, automation, and test in europe | 2016

A reconfigurable heterogeneous multicore with a homogeneous ISA

Jeckson Dellagostin Souza; Luigi Carro; Mateus B. Rutzig; Antonio Carlos Schneider Beck

Given the large diversity of embedded applications one can find in current portable devices, for energy and performance reasons one must exploit both Thread- and Instruction Level Parallelism. While MPSoCs are largely used for this purpose, they fail when one considers software productivity, since it comprises different ISAs that must be programmed separately. On the other hand, general purpose multicores implement the same ISA, but are composed of a homogeneous set of very power consuming superscalar processors. In this paper we show how one can effectively use a regular fabric to provide a number of different possible heterogeneous configurations while still sustaining the same ISA. This is done by leveraging the intrinsic regularity of a reconfigurable fabric, so several different organizations can be easily built with little effort. To ensure ISA compatibility, we use a binary translation mechanism that transforms code to be executed on the fabric at run-time. Using representative benchmarks, we show that one version of the heterogeneous system can outperform its homogenous counterpart in average by 59% in performance and 10% in energy, with EDP improvements in almost every scenario.

applied reconfigurable computing | 2011

CReAMS: an embedded multiprocessor platform

Mateus B. Rutzig; Antonio Carlos Schneider Beck; Luigi Carro

As the number of embedded applications is increasing, the current strategy of the companies is to launch a new platform within short periods of time to execute them efficiently with low energy consumption. However, for each new platform deployment, new tool chains come along, with additional libraries, debuggers and compilers. This strategy implies high hardware redesign costs, breaks binary compatibility and results in a high overhead in the software development process. Therefore, focusing on area savings, low energy consumption, binary compatibility maintenance and mainly software productivity improvement, we propose the exploitation of Custom Reconfigurable Arrays for Multiprocessor System (CReAMS). CReAMS is composed of multiple adaptive reconfigurable systems to efficiently exploit Instruction and Thread Level Parallelism (ILP and TLP) at hardware level, in a totally transparent fashion. Assuming the same chip area of a multiprocessor platform, the proposed architecture shows a reduction of 37% in energy-delay product (EDP) on average, when running applications with different amounts of ILP and TLP.

design, automation, and test in europe | 2013

A transparent and energy aware reconfigurable multiprocessor platform for simultaneous ILP and TLP exploitation

Mateus B. Rutzig; Antonio Carlos Schneider Beck; Luigi Carro

As the number of embedded applications increases, companies are launching new platforms within short periods of time to efficiently execute software with the lowest possible energy consumption. However, for each new platform deployment, new tool chains, with additional libraries, debuggers and compilers must come along, breaking binary compatibility. This strategy implies in high hardware and software redesign costs. In this scenario, we propose the exploitation of Custom Reconfigurable Arrays for Multiprocessor Systems (CReAMS). CReAMS is composed of multiple adaptive reconfigurable processors that simultaneously exploit Instruction and Thread Level Parallelism. It works in a transparent fashion, so binary compatibility is maintained, with no need to change the software development process or environment. We also show that CReAMS delivers higher performance per watt in comparison to a 4-issue Superscalar processor, when the same power budget is considered for both designs.

international parallel and distributed processing symposium | 2008

Balancing reconfigurable data path resources according to application requirements

Mateus B. Rutzig; Antonio Carlos Schneider Beck; Luigi Carro

Processor architectures are changing mainly due to the excessive power dissipation and the future break of Moores law. Thus, new alternatives are necessary to sustain the performance increase of the processors, while still allowing low energy computations. Reconfigurable systems are strongly emerging as one of these solutions. However, because they are very area consuming and deal with a large number of applications with diverse behaviors, new tools must be developed to automatically handle this new problem. This way, in this work we present a tool aimed to balance the reconfigurable area occupied with the performance required by a given application, calculating the exact size and shape of a reconfigurable data path. Using as case study a tightly coupled reconfigurable array and the Mibench Benchmark set, we show that the solution found by the proposed tool saves four times area in comparison with the non-optimized version of the reconfigurable logic, with a decrease of only 5.8% on average of its original performance. This way, we open new applications for reconfigurable devices as low cost accelerators.

Explore More