Is this you? Create Your Porfile

Milos D. Ercegovac

University of California, Los Angeles

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Milos D. Ercegovac is active.

Explore More

Publication

Featured researches published by Milos D. Ercegovac.

international conference on vlsi design | 2011

Trading Accuracy for Power with an Underdesigned Multiplier Architecture

Parag Kulkarni; Puneet Gupta; Milos D. Ercegovac

We propose a novel multiplier architecture with tunable error characteristics, that leverages a modified inaccurate 2x2 building block. Our inaccurate multipliers achieve an average power saving of 31.78% ? 45.4% over corresponding accurate multiplier designs, for an average error of 1.39%?3.32%. Using image filtering and JPEG compression as sample applications we show that our architecture can achieve 2X - 8X better Signal-Noise-Ratio (SNR) for the same power savings when compared to recent voltage over-scaling based power-error tradeoff methods. We project the multiplier power savings to bigger designs highlighting the fact that the benefits are strongly design dependent. We compare this circuit-centric approach to power quality tradeoffs with a pure software adaptation approach for a JPEG example. We also enhance the design to allow for correct operation of the multiplier using a residual adder, for non error resilient applications.

28th Annual Technical Symposium | 1984

On-Line Arithmetic: An Overview

Milos D. Ercegovac

We discuss in a tutorial manner the principles and techniques of on-line arithmetic. Several examples of on-line algorithms for the basic operations, the evaluation of vector and matrix expressions, solving linear systems and evaluating polynomials, are used to illustrate the characteristics of on-line arithmetic.

IEEE Transactions on Computers | 1990

Fast multiplication without carry-propagate addition

Milos D. Ercegovac; Tomás Lang

Conventional schemes for fast multiplication accumulate the partial products in redundant form (carry-save or signed-digit) and convert the result to conventional representation in the last step. This step requires a carry-propagate adder which is comparatively slow and occupies a significant area of the chip in a VLSI implementation. A report is presented on a multiplication scheme (left-to-right, carry-free, LRCF) that does not require this carry-propagate step. The LRCF scheme performs the multiplication most-significant bit first and produces a conventional sign-and-magnitude product (most significant n bits) by means of an on-the-fly conversion. The resulting implementation is fast and regular and is very well suited for VLSI. The LRCF scheme for general radix r and a radix-4 signed-digit implementation are presented. >

international symposium on microarchitecture | 2007

The Art of Deception: Adaptive Precision Reduction for Area Efficient Physics Acceleration

Thomas Y. Yeh; Petros Faloutsos; Milos D. Ercegovac; Sanjay J. Patel; Glenn Reinman

Physics-based animation has enormous potential to improve the realism of interactive entertainment through dynamic, immersive content creation. Despite the massively parallel nature of physics simulation, fully exploiting this parallelism to reach interactive frame rates will require significant area to place the large number of cores. Fortunately, interactive entertainment requires believability rather than accuracy. Recent work shows that real-time physics has a remarkable tolerance for reduced precision of the significant in floating-point (FP) operations. In this paper, we describe an architecture with a hierarchical floating-point unit (FPU) that leverages dynamic precision reduction to enable efficient FPU sharing among multiple cores. This sharing reduces the area required by these cores, thereby allowing more cores to be packed into a given area and exploiting more parallelism.

IEEE Transactions on Computers | 2005

High-performance low-power left-to-right array multiplier design

Zhijun Huang; Milos D. Ercegovac

We present a high-performance low-power design of linear array multipliers based on a combination of the following techniques: signal flow optimization in [3:2] adder array for partial product reduction, left-to-right leapfrog (LRLF) signal flow, and splitting of the reduction array into upper/lower parts. The resulting upper/lower LRLF (ULLRLF) multiplier is compared with tree multipliers. From automatic layout experiments, we find that ULLRLF multipliers have similar power, delay, and area as tree multipliers for n/spl les/32. With more regularity and inherently shorter interconnects, the ULLRLF structure presents a competitive alternative to tree structures in the design of fast low-power multipliers implemented in deep submicron VLSI technology.

IEEE Transactions on Computers | 1990

Simple radix-4 division with operands scaling

Milos D. Ercegovac; Tomás Lang

A radix-4 division algorithm with operands scaling is proposed. The algorithm uses a recurrence with redundant addition (carry-save or signed-digit) and combines simple scaling with a quotient-selection function that depends only on the estimate of the partial remainder and is independent of the divisor. The scheme results in a significant speedup with respect to both the radix-2 and radix-4 without scaling. >

IEEE Transactions on Computers | 2004

Algorithm and architecture for logarithm, exponential, and powering computation

José-Alejandro Piñeiro; Milos D. Ercegovac; Javier D. Bruguera

An architecture for the computation of logarithm, exponential, and powering operations is presented in this paper, based on a high-radix composite algorithm for the computation of the powering function (X/sup Y/). The algorithm consists of a sequence of overlapped operations: 1) digit-recurrence logarithm, 2) left-to-right carry-free (LRCF) multiplication, and 3) online exponential. A redundant number system is used and the selection in 1) and 3) is done by rounding except from the first iteration, when selection by table look-up is necessary to guarantee the convergence of the recurrences. A sequential implementation of the algorithm, with a control unit which allows the independent computation of logarithm and exponential, is proposed and the execution times and hardware requirements are estimated for single and double-precision floating-point computations. These estimates are obtained for radices from r=8 to r=1,024, according to an approximate model for the delay and area of the main logic blocks and help determining the radix values which lead to the most efficient implementations: r=32 and r=128.

field-programmable custom computing machines | 1998

A variable long-precision arithmetic unit design for reconfigurable coprocessor architectures

Alexandre F. Tenca; Milos D. Ercegovac

This paper presents the organization of an arithmetic unit for variable long-precision (VLP) operands suitable for reconfigurable computing. The reconfigurable arithmetic coprocessor (RAC) cooperates with the host computer in the VLP tasks. The main design issues addressed in the paper are: (a) mapping of the most frequent and time consuming operations of the VLP arithmetic algorithms to RAG, and (b) design of VLP algorithms that allow reduced reconfiguration time between arithmetic operations. The VLP arithmetic algorithms proposed cover multiplication, division and square root. In this paper we present the main building blocks used in the VLP arithmetic circuits, show the similarities of each arithmetic operator and present area/time estimates of these circuits in Xilinx FPGAs.

IEEE Transactions on Computers | 1973

Radix-16 Evaluation of Certain Elementary Functions

Milos D. Ercegovac

This paper describes a family of algorithms for evaluation of a class of elementary functions including division, logarithms, and exponentials. The main objective is to demonstrate the feasibility of higher radix implementations, in particular, radix 16, and to compare performance with radix 2. The emphasis is not on optimality of a single algorithm, but rather on the optimality of a class of algorithms. An attempt to implement a much wider class of functions than is presently done in arithmetic units is encouraged by the current level of digital technology and the existence of suitable algorithms. Besides the definitions of the algorithms, which are based on continued products and continued sums, details related to implementation are discussed.

IEEE Transactions on Computers | 2000

Improving Goldschmidt division, square root, and square root reciprocal

Milos D. Ercegovac; Laurent Imbert; David W. Matula; Jean-Michel Muller; Guoheng Wei

The aim of this paper is to accelerate division, square root, and square root reciprocal computations when the Goldschmidt method is used on a pipelined multiplier. This is done by replacing the last iteration by the addition of a correcting term that can be looked up during the early iterations. We describe several variants of the Goldschmidt algorithm, assuming 4-cycle pipelined multiplier, and discuss obtained number of cycles and error achieved. Extensions to other than 4-cycle multipliers are given. If we call G/sub m/ the Goldschmidt algorithm with m iterations, our variants allow us to reach an accuracy that is between that of G/sub 3/ and that of G/sub 4/, with a number of cycle equal to that of G/sub 3/.

Explore More