Milos D. Ercegovac
University of California, Los Angeles
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Milos D. Ercegovac.
international conference on vlsi design | 2011
Parag Kulkarni; Puneet Gupta; Milos D. Ercegovac
We propose a novel multiplier architecture with tunable error characteristics, that leverages a modified inaccurate 2x2 building block. Our inaccurate multipliers achieve an average power saving of 31.78% ? 45.4% over corresponding accurate multiplier designs, for an average error of 1.39%?3.32%. Using image filtering and JPEG compression as sample applications we show that our architecture can achieve 2X - 8X better Signal-Noise-Ratio (SNR) for the same power savings when compared to recent voltage over-scaling based power-error tradeoff methods. We project the multiplier power savings to bigger designs highlighting the fact that the benefits are strongly design dependent. We compare this circuit-centric approach to power quality tradeoffs with a pure software adaptation approach for a JPEG example. We also enhance the design to allow for correct operation of the multiplier using a residual adder, for non error resilient applications.
28th Annual Technical Symposium | 1984
Milos D. Ercegovac
We discuss in a tutorial manner the principles and techniques of on-line arithmetic. Several examples of on-line algorithms for the basic operations, the evaluation of vector and matrix expressions, solving linear systems and evaluating polynomials, are used to illustrate the characteristics of on-line arithmetic.
IEEE Transactions on Computers | 1990
Milos D. Ercegovac; Tomás Lang
Conventional schemes for fast multiplication accumulate the partial products in redundant form (carry-save or signed-digit) and convert the result to conventional representation in the last step. This step requires a carry-propagate adder which is comparatively slow and occupies a significant area of the chip in a VLSI implementation. A report is presented on a multiplication scheme (left-to-right, carry-free, LRCF) that does not require this carry-propagate step. The LRCF scheme performs the multiplication most-significant bit first and produces a conventional sign-and-magnitude product (most significant n bits) by means of an on-the-fly conversion. The resulting implementation is fast and regular and is very well suited for VLSI. The LRCF scheme for general radix r and a radix-4 signed-digit implementation are presented. >
international symposium on microarchitecture | 2007
Thomas Y. Yeh; Petros Faloutsos; Milos D. Ercegovac; Sanjay J. Patel; Glenn Reinman
Physics-based animation has enormous potential to improve the realism of interactive entertainment through dynamic, immersive content creation. Despite the massively parallel nature of physics simulation, fully exploiting this parallelism to reach interactive frame rates will require significant area to place the large number of cores. Fortunately, interactive entertainment requires believability rather than accuracy. Recent work shows that real-time physics has a remarkable tolerance for reduced precision of the significant in floating-point (FP) operations. In this paper, we describe an architecture with a hierarchical floating-point unit (FPU) that leverages dynamic precision reduction to enable efficient FPU sharing among multiple cores. This sharing reduces the area required by these cores, thereby allowing more cores to be packed into a given area and exploiting more parallelism.
IEEE Transactions on Computers | 2005
Zhijun Huang; Milos D. Ercegovac
We present a high-performance low-power design of linear array multipliers based on a combination of the following techniques: signal flow optimization in [3:2] adder array for partial product reduction, left-to-right leapfrog (LRLF) signal flow, and splitting of the reduction array into upper/lower parts. The resulting upper/lower LRLF (ULLRLF) multiplier is compared with tree multipliers. From automatic layout experiments, we find that ULLRLF multipliers have similar power, delay, and area as tree multipliers for n/spl les/32. With more regularity and inherently shorter interconnects, the ULLRLF structure presents a competitive alternative to tree structures in the design of fast low-power multipliers implemented in deep submicron VLSI technology.
IEEE Transactions on Computers | 1990
Milos D. Ercegovac; Tomás Lang
A radix-4 division algorithm with operands scaling is proposed. The algorithm uses a recurrence with redundant addition (carry-save or signed-digit) and combines simple scaling with a quotient-selection function that depends only on the estimate of the partial remainder and is independent of the divisor. The scheme results in a significant speedup with respect to both the radix-2 and radix-4 without scaling. >
IEEE Transactions on Computers | 2004
José-Alejandro Piñeiro; Milos D. Ercegovac; Javier D. Bruguera
An architecture for the computation of logarithm, exponential, and powering operations is presented in this paper, based on a high-radix composite algorithm for the computation of the powering function (X/sup Y/). The algorithm consists of a sequence of overlapped operations: 1) digit-recurrence logarithm, 2) left-to-right carry-free (LRCF) multiplication, and 3) online exponential. A redundant number system is used and the selection in 1) and 3) is done by rounding except from the first iteration, when selection by table look-up is necessary to guarantee the convergence of the recurrences. A sequential implementation of the algorithm, with a control unit which allows the independent computation of logarithm and exponential, is proposed and the execution times and hardware requirements are estimated for single and double-precision floating-point computations. These estimates are obtained for radices from r=8 to r=1,024, according to an approximate model for the delay and area of the main logic blocks and help determining the radix values which lead to the most efficient implementations: r=32 and r=128.
field-programmable custom computing machines | 1998
Alexandre F. Tenca; Milos D. Ercegovac
This paper presents the organization of an arithmetic unit for variable long-precision (VLP) operands suitable for reconfigurable computing. The reconfigurable arithmetic coprocessor (RAC) cooperates with the host computer in the VLP tasks. The main design issues addressed in the paper are: (a) mapping of the most frequent and time consuming operations of the VLP arithmetic algorithms to RAG, and (b) design of VLP algorithms that allow reduced reconfiguration time between arithmetic operations. The VLP arithmetic algorithms proposed cover multiplication, division and square root. In this paper we present the main building blocks used in the VLP arithmetic circuits, show the similarities of each arithmetic operator and present area/time estimates of these circuits in Xilinx FPGAs.
IEEE Transactions on Computers | 1973
Milos D. Ercegovac
This paper describes a family of algorithms for evaluation of a class of elementary functions including division, logarithms, and exponentials. The main objective is to demonstrate the feasibility of higher radix implementations, in particular, radix 16, and to compare performance with radix 2. The emphasis is not on optimality of a single algorithm, but rather on the optimality of a class of algorithms. An attempt to implement a much wider class of functions than is presently done in arithmetic units is encouraged by the current level of digital technology and the existence of suitable algorithms. Besides the definitions of the algorithms, which are based on continued products and continued sums, details related to implementation are discussed.
IEEE Transactions on Computers | 2000
Milos D. Ercegovac; Laurent Imbert; David W. Matula; Jean-Michel Muller; Guoheng Wei
The aim of this paper is to accelerate division, square root, and square root reciprocal computations when the Goldschmidt method is used on a pipelined multiplier. This is done by replacing the last iteration by the addition of a correcting term that can be looked up during the early iterations. We describe several variants of the Goldschmidt algorithm, assuming 4-cycle pipelined multiplier, and discuss obtained number of cycles and error achieved. Extensions to other than 4-cycle multipliers are given. If we call G/sub m/ the Goldschmidt algorithm with m iterations, our variants allow us to reach an accuracy that is between that of G/sub 3/ and that of G/sub 4/, with a number of cycle equal to that of G/sub 3/.