Pablo Balzola | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pablo Balzola is active.

Explore More

Publication

Featured researches published by Pablo Balzola.

international conference on computer design | 2001

Design alternatives for parallel saturating multioperand adders

Pablo Balzola; Michael J. Schulte; Jie Ruan; C. John Glossner; Erdem Hokenek

Parallel saturating multioperand adders significantly improve the performance of global system for mobile (GSM) speech coders by giving compilers and assembly language programmers the ability to parallelize loops containing saturating dot products, while maintaining GSM compliant results. This paper presents four designs for parallel saturating multioperand adders. These designs have at most one carry-propagate adder on their critical delay path, yet produce the same results that would be obtained if the additions were performed serially with saturation after each addition. The four parallel designs offer tradeoffs in terms of area, worst case delay, and dot product latency. Compared to a 5-input serial design, the 5-input parallel designs have delays up to 3.51 times shorter.

compilers, architecture, and synthesis for embedded systems | 2000

Parallel saturating multioperand adders

Michael J. Schulte; Pablo Balzola; Jie Ruan; C. John Glossner

This paper presents designs for parallel saturating multioperand adders. These adders have only a single carrypropagate adder on the critical delay path, yet produce the same results that would be obtained if the additions were performed serially with saturation after each operation. When used with parallel saturating multipliers or multiplyaccumulate units, these adders signi cantly improve the performance of GSM speech coders. They can also easily be modi ed to perform either saturating or wraparound multioperand addition, based on an input control signal. Since parallel saturating multioperand adders have more area and less delay than serial saturating multioperand adders, they are suitable for high-performance digital signal processing systems.

IEEE Transactions on Very Large Scale Integration Systems | 2001

Towards a very high bandwidth wireless battery powered device

John Glossner; David Routenberg; Erdem Hokenek; Mayan Moudgill; Michael J. Schulte; Pablo Balzola; Stamatis Vassiliadis

We discuss the hardware and software challenges in building a 2 Mbit per second wireless battery powered communications device. Of primary importance is power dissipation. To achieve aggressive power targets, a host of new techniques are required at all levels of the design hierarchy. Techniques for parallelizing saturating arithmetic will become important because of the software optimizations they enable. Highly configurable programmable structures will enable multiprotocol SOC solutions. To program complex SOCs, new compiler techniques will be required. Hardware implementations will need to be intimately aware of these software techniques. In particular both signal processing code written in C and control code written in Java will drive new compilation techniques to enable broadband 3G wireless systems.

conference on advanced signal processing algorithms architectures and implemenations | 2000

Combined unsigned and two's complement saturating multipliers

Michael J. Schulte; Mustafa Gok; Pablo Balzola; Robert W. Brocato

In many digital signal processing and multimedia applications, results that overflow are saturated to the most positive or most negative representable number. This paper presents efficient techniques for performing saturating n-bit integer multiplication on unsigned and twos complement numbers. Unlike conventional techniques for saturating multiplication, which compute a 2n-bit product and then examine the n most significant product bits to determine if overflow has occurred, the techniques presented in this paper compute only the (n + 1) least significant bits of the product. Specialized overflow detection units, which operate in parallel with the multiplier, determine if overflow has occurred and the product should be saturated. These techniques are applied to designs for saturating array multipliers that perform either unsigned or twos complement saturating integer multiplication, based on an input control signal. Compared to array multipliers that use conventional methods for saturation, these multipliers have about half as much area and delay.

application-specific systems, architectures, and processors | 2004

A low-power carry skip adder with fast saturation

Michael J. Schulte; Kai Chirca; John Glossner; Haoran Wang; Suman Mamidi; Pablo Balzola; Stamatis Vassiliadis

We present the design of a carry skip adder that achieves low power dissipation and high-performance operation. The carry skip adders delay and power dissipation are reduced by dividing the adder into variable-sized blocks that balance the delay of inputs to the carry chain. This grouping reduces active power by minimizing extraneous glitches and transitions. Each block also uses highly optimized complementing carry look-ahead logic to reduce delay. Compared to previous designs, the adder architecture decreases power consumption by reducing the number of transistors, logic levels, and glitches. A 32-bit carry skip adder design that uses our approach has been implemented in 130 nm CMOS technology. At 1.2 V and 25 C, the 32-bit adder has a critical path delay of 921 ps and average power dissipation normalized to 600 MHz operation of 0.786 mW. We also present a technique to quickly perform saturating addition, which is useful in a variety of digital signal processing and multimedia applications. Our technique for fast saturation is based on techniques for carry select addition and works particularly well when the input and output operands can have different formats. A 40-bit carry skip adder that uses our technique for fast saturation has critical path delays of 1149 ps in 130 nm technology at 1.2 V and 25 C and 560 ps in 90nm technology at 1.0 V and 25 C. The 40-bit adders average power dissipation normalized to 600 MHz operation is 0.928 mW in 130 nm technology and 0.335 mW in 90 nm technology.

asilomar conference on signals, systems and computers | 2001

Efficient integer multiplication overflow detection circuits

Mustafa Gok; Michael J. Schulte; Pablo Balzola

Multiplication of two n-bit integers produces a 2n-bit product. To allow the result to be stored in the same format as the inputs, many processors return the n least significant bits of the product and an overflow flag. This paper describes methods for integer multiplication with overflow detection for unsigned and twos complement numbers. A method for combining unsigned and twos complement integer multiplication with overflow detection is also presented. The overflow detection circuits presented in this paper have O(n) gates and O(log(n)) delay, which makes them more efficient than previous overflow detection circuits.

international conference on systems | 2009

Synchronization on heterogeneous multiprocessor systems

Mayan Moudgill; Vitaly Kalashnikov; Murugappan Senthilvelan; Umesh Srikantiah; Tak-po Li; Pablo Balzola; John Glossner

To meet the exponential increase in processing requirements of present day embedded system applications, System-On-Chip (SoC) designs increasingly have multiple processing elements on the same die. The functionality of these processing elements varies considerably, and includes hardware accelerators for specific Digital Signal Processing (DSP) kernels, high-performance DSP cores, and low-power application processors. While executing applications, these processing elements typically share system memory and peripherals, and hence need synchronization to maintain system integrity. Further complicating the issue is the fact that these processing elements can be custom designed or off-the-shelf Intellectual Property (IP) cores that are generally not designed for operation in multiprocessor environments, and consequently lack multiprocessor synchronization support. Hence there is a need for simple and elegant low-power, low-latency techniques for synchronization support that can be seamlessly integrated and require little or no modifications to the already pre-verified processing elements. In this paper, we describe synchronization counters, a mechanism that allows seamless implementation of low-latency multiprocessor synchronization with incremental hardware penalty. This mechanism is usable in heterogeneous multiprocessor environments even when the individual processing elements lack native synchronization support. The synchronization counters are implemented and verified on a four-processor SoC targeted for handheld devices, the Sandbridge Technologies SB3500. The SoC contains three special purpose DSPs and an ARM application processor, sharing system memory and peripherals.

Archive | 2004