Darjn Esposito
Information Technology University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Darjn Esposito.
IEEE Transactions on Circuits and Systems | 2015
Darjn Esposito; Davide De Caro; Ettore Napoli; Nicola Petra; Antonio G. M. Strollo
Variable latency adders have been recently proposed in literature. A variable latency adder employs speculation: the exact arithmetic function is replaced with an approximated one that is faster and gives the correct result most of the time, but not always. The approximated adder is augmented with an error detection network that asserts an error signal when speculation fails. Speculative variable latency adders have attracted strong interest thanks to their capability to reduce average delay compared to traditional architectures. This paper proposes a novel variable latency speculative adder based on Han-Carlson parallel-prefix topology that resulted more effective than variable latency Kogge-Stone topology. The paper describes the stages in which variable latency speculative prefix adders can be subdivided and presents a novel error detection network that reduces error probability compared to previous approaches. Several variable latency speculative adders, for various operand lengths, using both Han-Carlson and Kogge-Stone topology, have been synthesized using the UMC 65 nm library. Obtained results show that proposed variable latency Han-Carlson adder outperforms both previously proposed speculative Kogge-Stone architectures and non-speculative adders, when high-speed is required. It is also shown that non-speculative adders remain the best choice when the speed constraint is relaxed.
international symposium on circuits and systems | 2016
Darjn Esposito; Gerardo Castellano; Davide De Caro; Ettore Napoli; Nicola Petra; Antonio G. M. Strollo
Approximate computing is emerging as a new paradigm to improve digital circuit performance by relaxing the requirement of performing exact calculations. Approximate adders rely on the idea that for uniformly distributed inputs, long carry-propagation chains are rarely activated. Unfortunately, however, the above assumption on input signal statistics is not always verified; in this paper we focus on the case (often encountered in practical signal processing applications) when the inputs have a Gaussian distribution. We show that for Gaussian inputs the error probability of previously proposed approximate adders approaches 25% for low sigma values, which is much larger than the uniform case. On the basis of this analysis, we propose an approximate adder with a correction circuit that drastically reduces the error rate for Gaussian distributed operand s. In order to investigate the performance of our approach in a real application, simulated results for a simple audio processing system are reported. Implementation results in 65nm technology are also presented.
IEEE Transactions on Circuits and Systems | 2016
Darjn Esposito; Davide De Caro; Antonio G. M. Strollo
A variable latency adder (VLA) reduces average addition time by using speculation: the exact arithmetic function is replaced by an approximated one, that is faster and gives correct results most of the times. When speculation fails, an error detection and correction circuit gives the correct result in the following clock cycle. Previous papers investigate VLAs based on Kogge-Stone, Han-Carlson or carry select topologies, speculating that carry propagation involves only a few consecutive bits. In several applications using 2s complement representation, however, operands have a Gaussian distribution and a nontrivial portion of carry chains can be as long as the adder size. In this paper we propose five novel VLA architectures, based on Brent-Kung, Ladner-Fisher, Sklansky, Hybrid Han-Carlson, and Carry increment parallel-prefix topologies. Moreover, we present a new efficient error detection and correction technique, that makes proposed VLAs suitable for applications using 2s complement representation. In order to investigate VLAs performances, proposed architectures have been synthesized using the UMC 65 nm library, for operand lengths ranging from 32 to 128 bits. Obtained results show that proposed VLAs outperform previous speculative architectures and standard (non-speculative) adders when high-speed is required.
international symposium on circuits and systems | 2017
Darjn Esposito; Davide De Caro; Ettore Napoli; Nicola Petra; Antonio G. M. Strollo
Approximate computing improves digital circuit performance by relaxing the requirement of performing exact calculations. In this paper, we investigate the use of approximate adders in the final stage of a carry save multiplier-accumulator (MAC), designed for image filtering application. We propose a design flow based on synthesis tools, starting from HDL description. After a first step in which an exact carry-propagate adder is used, the synthesized netlist is simulated to extract the statistics of the terms summed in the carry-propagate adder. We then design the approximate adder, to meet the required error characteristics given the inputs statistics. The netlist is finally modified by substituting the exact adder with the approximate one, and a final synthesis and optimization is performed. The presented design example in 28nm CMOS shows that a 14% power gain can be obtained, with a limited image quality degradation.
conference on ph.d. research in microelectronics and electronics | 2017
Darjn Esposito; Antonio G. M. Strollo; Massimo Alioto
Sacrificing exact calculations to improve digital circuit performance is at the foundation of approximate computing. In this paper, an approximate multiply-and-accumulate (MAC) unit is introduced. The MAC partial product terms are compressed by using simple OR gates as approximate counters; moreover, to further save energy, selected columns of the partial product terms are not formed. A compensation term is introduced in the proposed MAC, to reduce the overall approximation error. A MAC unit, specialized to perform 2D convolution, is designed following the proposed approach and implemented in TSMC 40nm technology in four different configurations. The proposed circuits achieve power savings more than 60%, compared to standard, exact MAC, with tolerable image quality degradation.
latin american symposium on circuits and systems | 2016
Ettore Napoli; Gerardo Castellano; Darjn Esposito; Antonio G. M. Strollo
The generation of complex signal sources is important for test and validation of electronic systems. With reference to noise sources, commercial systems only provide white noise sources while the scientific literature only recently proposed circuits that generate programmable colored noise. This paper proposes a programmable colored noise generator that, while generating noise signals with features matching the state of the art, overcomes the previously proposed circuits in terms of speed (+10%) and logic resource occupation (-75%).
international symposium on circuits and systems | 2017
Darjn Esposito; Antonio G. M. Strollo; Massimo Alioto
Approximate computing leverages the inherent error resiliency present in many applications to improve circuits performance. Precision-scalable systems dynamically introduce approximations to trade off power and quality, based on the application under execution and the incoming dataset. In this paper, this principle is explored for the first time in the context of latch memories by introducing the ability to scale their precision, while retaining the ability to synthesize them in an automated manner. This offers additional opportunities to reduce energy, compared to the well-known suitability for aggressive voltage scaling of latch memories. A case study based on image processing applications is presented to evaluate the quality-power trade-off in 40nm CMOS. The analysis shows that the total power is reduced by up to 56% when the precision requirement is relaxed.
IEEE Transactions on Computers | 2017
Ettore Napoli; Gerardo Castellano; Davide De Caro; Darjn Esposito; Nicola Petra; Antonio G. M. Strollo
The paper proposes a SISO register circuit, functionally equivalent to a Shift Register, that is the optimal design choice when the input data have a reduced transition probability. The proposed circuit obtains improved performances by only storing the transitions of the input data, thus saving logic and power.
IEEE Transactions on Circuits and Systems | 2017
Ettore Napoli; Gerardo Castellano; Davide De Caro; Darjn Esposito; Nicola Petra; Antonio G. M. Strollo
The generation of complex signal sources is important for test and validation of electronic systems. With reference to noise sources, commercial systems usually provide white noise sources, while the scientific literature only recently proposed circuits that generate programmable colored noise. This paper proposes a filtering circuit and an algorithm to design the same that produces an arbitrary colored electrical noise. The proposed system improves the performances of the previously proposed circuits in terms of spectral characteristics of the output, in terms of logic resource occupation and power dissipation, while providing no penalty on the working frequency.
IEEE Transactions on Circuits and Systems | 2017
Davide De Caro; Ettore Napoli; Darjn Esposito; Gerardo Castellano; Nicola Petra; Antonio G. M. Strollo
Piecewise polynomial interpolation is a well-established technique for hardware function evaluation. The paper describes a novel technique to minimize polynomial coefficients wordlength with the aim of obtaining either exact or faithful rounding at a reduced hardware cost. The standard approaches employed in literature subdivide the design of piecewise-polynomial interpolators into three steps (coefficients calculation, coefficients quantization and arithmetic hardware optimization) and estimate conservatively the overall approximation error as the sum of the error components arising in each step. The proposed technique, using Integer Linear Programming (ILP), optimizes the polynomial coefficients taking into account all error components simultaneously. This gives two advantages. Firstly, we can obtain exactly rounded approximations; secondly, for faithfully rounded interpolators, we avoid any overdesign due to pessimistic assumptions on error components, optimizing in this way the resulting hardware. The proposed ILP based algorithm requires an acceptable CPU time (from few seconds to tens of minutes) and is suited for approximations up to, maximum, 24 input bits. The results compare favorably with previously published data. We present synthesis results in 28 nm and 90 nm CMOS technologies, to further assess the effectiveness of the proposed approach.