Antonio Carlos Schneider Beck

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Antonio Carlos Schneider Beck is active.

Explore More

Publication

Featured researches published by Antonio Carlos Schneider Beck.

design, automation, and test in europe | 2008

Transparent reconfigurable acceleration for heterogeneous embedded applications

Antonio Carlos Schneider Beck; Mateus B. Rutzig; Georgi Gaydadjiev; Luigi Carro

Embedded systems are becoming increasingly complex. Besides the additional processing capabilities, they are characterized by high diversity of computational models coexisting in a single device. Although reconfigurable architectures have already shown to be a potential solution for such systems, they just present significant speedups of very specific dataflow oriented kernels. Furthermore, reconfigurable fabric is still withheld by the need of special tools and compilers, clearly not sustaining backward software compatibility. In this paper, we propose a new technique to optimize both dataflow and control-flow oriented code in a totally transparent process, without the need of any modification in the source or binary codes. For that, we have developed a Binary Translation algorithm implemented in hardware, which works in parallel to a MIPS processor. The proposed mechanism is responsible for transforming sequences of instructions at runtime to be executed on a dynamic coarse-grain reconfigurable array, supporting speculative execution. Executing the MIBench suite, we show performance improvements of up to 2.5 times, while reducing 1.7 times the required energy, using trivial hardware resources.

design automation conference | 2005

Dynamic reconfiguration with binary translation: breaking the ILP barrier with software compatibility

Antonio Carlos Schneider Beck; Luigi Carro

In this paper we present the impact of dynamically translating any sequence of instructions into combinational logic. The proposed approach combines a reconfigurable architecture with a binary translation mechanism, being totally transparent for the software designer. Besides ensuring software compatibility, the technique allows porting the same code for different machines tracking technological evolutions. The target processor is a Java machine able to execute Java bytecodes. Experimental results show that even code without any available parallelism can benefit from the proposed approach. Algorithms used in the embedded systems domain were accelerated 4.6 times in the mean, while spending 10.89 times less energy in the average. We present results regarding the impact of area and power, and compare the proposed approach with other Java machines, including a VLIW one.

IEEE Transactions on Very Large Scale Integration Systems | 2006

Low Power Java Processor for Embedded Applications

Antonio Carlos Schneider Beck; Luigi Cairo

This chapter presents a low power architecture of a Java processor. We show that the use of techniques like pipeline and the implementation of the stack in a register bank instead of using the main memory allow aggressive reduction of power dissipation, with a very small area overhead. Besides, thanks to the forwarding technique and to the specific stack machine organization, huge power savings can be obtained when applying this technique to a pipelined implementation of the architecture. Several examples of embedded applications are used to show the power savings obtained through the architecture optimization

Archive | 2012

Adaptable Embedded Systems

Antonio Carlos Schneider Beck; Carlos Arthur Lang Lisba; Luigi Carro

As embedded systems become more complex, designers face a number of challenges at different levels: they need to boost performance, while keeping energy consumption as low as possible, they need to reuse existent software code, and at the same time they need to take advantage of the extra logic available in the chip, represented by multiple processors working together. This book describes several strategies to achieve such different and interrelated goals, by the use of adaptability. Coverage includes reconfigurable systems, dynamic optimization techniques such as binary translation and trace reuse, new memory architectures including homogeneous and heterogeneous multiprocessor systems, communication issues and NOCs, fault tolerance against fabrication defects and soft errors, and finally, how one can combine several of these techniques together to achieve higher levels of performance and adaptability. The discussion also includes how to employ specialized software to improve this new adaptive system, and how this new kind of software must be designed and programmed.

symposium on integrated circuits and systems design | 2004

A VLIW low power Java processor for embedded applications

Antonio Carlos Schneider Beck; Luigi Carro

This paper presents a pioneer VLIW architecture of a native Java processor. We show that, thanks to the specific stack architecture and to the use of the VLIW technique, one is able to obtain a meaningful reduction of power dissipation, with small area overhead, when compared to other ways of executing Java in hardware. The underlying technique is based on the reuse of memory access instructions, hence reducing power during memory or cache accesses. The architecture is validated for some complex embedded applications like IMDCT computation and other data processing benchmarks.

applied reconfigurable computing | 2008

Run-Time Adaptable Architectures for Heterogeneous Behavior Embedded Systems

Antonio Carlos Schneider Beck; Mateus B. Rutzig; Georgi Gaydadjiev; Luigi Carro

As embedded applications are getting more complex, they are also demanding highly diverse computational capabilities. The majority of all previously proposed reconfigurable architectures targets static data stream oriented applications, optimizing very specific computational kernels, corresponding to the typical embedded systems characteristics in the past. Modern embedded devices, however, impose totally new requirements. They are expected to support a wide variety of programs on a single platform. Besides getting more heterogeneous, these applications have very distinct behaviors. In this paper we explore this trend in more detail. First, we present a study about the behavioral difference of embedded applications based on the Mibench benchmark suite. Thereafter, we analyze the potential optimizations and constraints for two different run-time dynamic reconfigurable architectures with distinct programmability strategies: a fine-grain FPGA based accelerator and a coarse-grain array composed by ordinary functional units. Finally, we demonstrate that reconfigurable systems that are focused to single data stream behavior may not suffice anymore.

international parallel and distributed processing symposium | 2009

A low cost and adaptable routing network for reconfigurable systems

Ricardo S. Ferreira; Marcone Laure; Antonio Carlos Schneider Beck; Thiago Berticelli Lo; Mateus B. Rutzig; Luigi Carro

Nowadays, scalability, parallelism and fault-tolerance are key features to take advantage of last silicon technology advances, and that is why reconfigurable architectures are in the spotlight. However, one of the major problems in designing reconfigurable and parallel processing elements concerns the design of a cost-effective interconnection network. This way, considering that Multistage Interconnection Network (MIN) has been successfully used in several computer system levels and applications in the past, in this work we propose the use of a MIN, at the word level, on a coarse-grained reconfigurable architecture. More precisely, this work presents a novel parallel self-placement and routing mechanism for MIN on the circuit-switching mode. We take into account one-to-one as well as multicast (one-to-many) permutations. Our approach is scalable and it is targeted to be used in run-time environments where dynamic routing among functional units is required. In addition, our algorithm is embedded in the switch structure, and it is independent of the interstage interconnection pattern. Our approach can handle blocking and non-blocking networks, symmetrical or asymmetrical topologies. As case study, we use the proposed technique in a dynamic reconfigurable system, showing a major area reduction of 30% without performance overhead.

Microprocessors and Microsystems | 2014

A transparent and adaptive reconfigurable system

Antonio Carlos Schneider Beck; Mateus B. Rutzig; Luigi Carro

In the current scenario, where computer systems are characterized by a high diversity of applications coexisting in a single device, and with the stagnation in frequency scaling because of the excessive power dissipation, reconfigurable systems have already proven to be very effective. However, they all present two major drawbacks, which are addressed by this work: lack of transparency (the need for special tools or compilers that changes the original code) and no ability to adapt to applications with different behaviors and characteristics, so significant gains are achieved only in very specific data stream oriented applications. Therefore, this work proposes the Dynamic Instruction Merging (DIM), a Binary Translation mechanism responsible for transforming sequences of instructions into a coarse-grained array configuration at run-time, in a totally transparent process, with support to speculative execution. The proposed system does not impose any kind of modification to the source or binary codes, so full binary compatibility is maintained. Moreover, it can optimize any application, even those that do not present specific kernels for optimization. DIM presents, on average, 2.7 times of performance gains and 2.35 times of energy savings over a MIPS processor, and a higher IPC than an out-of-order superscalar processor, running the MIBench benchmark set.

IEEE Transactions on Very Large Scale Integration Systems | 2007

Transparent acceleration of data dependent instructions for general purpose processors

Antonio Carlos Schneider Beck; Luigi Carro

Although transistor scaling keeps following Moore’s law, and more area is available for designers, the clock frequency and ILP rate do not present the same level of growth anymore. This way, new architectural alternatives are necessary. Reconfigurable fabric appears to be one emerging possibility: besides exploiting the parallelism among instructions, it can also accelerate sequences of data dependent ones. However, coarse grain reconfiguration wide spread usage is still withhold by the need of special tools and compilers, which clearly do not sustain the reuse of legacy code without any modification. Based on all these facts, this work proposes a new Binary Translation algorithm, implemented in hardware and working in parallel to the processor, responsible for transforming sequences of instructions at run-time to be executed on a dynamic coarse-grain reconfigurable array, tightly coupled to a traditional RISC machine. Therefore, we can take advantage of using pure combinational logic to optimize even control-flow oriented code in a totally transparent process, without any modification in the source or binary codes. Using the Simplescalar Toolset together with the embedded benchmark suite MIBench, we show performance improvements and area evaluation when comparing against traditional superscalar architectures.

ieee international symposium on parallel distributed processing workshops and phd forum | 2010

A low-energy approach for context memory in reconfigurable systems

Thiago Berticelli Lo; Antonio Carlos Schneider Beck; Mateus B. Rutzig; Luigi Carro

In most of the works concerning reconfigurable computing, the main objective is system optimization by taking into account the known requirements of a project, such as speedup, energy or area. However, as it will be shown in this paper, although very significant, the impact of the context memory is often ignored. Since the context memory is responsible for keeping configurations of the reconfigurable unit, the word size and hence the number of output bits is orders of magnitude larger than the regular memories, considerably increasing the energy consumption and area occupation. Therefore, in this article we propose a technique to handle these issues, while maintaining system performance. Using as case study a coarse-grain architecture tightly coupled to the MIPS R3000 processor, we show that the context memory can represent up to 63% of the total system energy and, by using the proposed approach, it is possible to save 59% of this amount, without any performance penalties.

Explore More