Nicola Petra
University of Naples Federico II
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nicola Petra.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2005
Antonio G. M. Strollo; Nicola Petra; Davide De Caro
In this paper, a new error-compensation network for fixed-width multipliers is proposed. The error-compensation block is composed of two summation trees which are optimally chosen in order to minimize either the mean-square error or the maximum absolute error. The new technique substantially improves error performances with respect to previously proposed approaches. Simulation results show that new fixed-width multipliers exhibit significant improvements both in propagation delay and in power dissipation with respect to previous solutions.
IEEE Transactions on Circuits and Systems | 2010
Nicola Petra; Davide De Caro; Valeria Garofalo; Ettore Napoli; Antonio G. M. Strollo
Truncated multipliers compute the n most-significant bits of the n × n bits product. This paper focuses on variable-correction truncated multipliers, where some partial-products are discarded, to reduce complexity, and a suitable compensation function is added to partly compensate the introduced error. The optimal compensation function, that minimizes the mean square error, is obtained in this paper in closed-form for the first time. A sub optimal compensation function, best suited for hardware implementation, is introduced. Efficient multipliers implementation based on sub-optimal function is discussed. Proposed truncated multipliers are extensively compared with previously proposed circuits. Experimental results, for a 0.18 μm technology, are also presented.
IEEE Transactions on Very Large Scale Integration Systems | 2005
Antonio G. M. Strollo; Davide De Caro; Ettore Napoli; Nicola Petra
A new sense-amplifier-based flip-flop is presented. The output latch of the proposed circuit can be considered as an hybrid solution between the standard NAND-based set/reset latch and the NC-/sup 2/MOS approach. The proposed flip-flop provides ratioless design, reduced short-circuit power dissipation, and glitch-free operation. The simulation results, obtained for a 0.25-/spl mu/m technology, show improvements in the clock-to-output delay and the power dissipation with respect to the recently proposed high-speed flip-flops. The new circuit has been successfully employed in a high-speed direct digital frequency synthesizer chip, highlighting the effectiveness of the proposed flip-flop in high-speed standard cell-based applications.
IEEE Journal of Solid-state Circuits | 2010
Davide De Caro; Carlo Alberto Romani; Nicola Petra; Antonio G. M. Strollo; Claudio Parrella
Spread spectrum clocking is an effective solution to reduce the electromagnetic interference produced by digital chips, using a clock signal with a frequency that is intentionally swept (frequency modulated) within a certain frequency range, with a predefined modulation profile. We present the implementation of an all-digital spread spectrum clock generator. The circuit is realized by using a design flow completely based on standard cells and is able to perform clock spreading with an arbitrary modulation profile and a modulation frequency up to 5 MHz. The circuit uses two digitally controlled delay lines driven by a digital modulator to synthesize the output waveform. A replica delay line is employed in a real-time measurement circuit to track process, voltage and temperature variations. A chip has been implemented in a 65 nm CMOS technology. The chip is able to generate signals up to 1.27 GHz. The measured peak level reduction of the clock spectrum, at 750 MHz output frequency, is 20.5 dB with a 6% modulation depth. The power dissipation is 44 mW @ 1.27 GHz.
IEEE Journal of Solid-state Circuits | 2007
Antonio G. M. Strollo; Davide De Caro; Nicola Petra
The paper presents a detailed description of a direct digital frequency synthesizer (DDFS) based on a Multipartite Table Method (MTM) which is a salient lookup table compression technique. A novel algorithm to find the optimal MTM decomposition which minimizes the ROM size while archiving a target spurious free dynamic range (SFDR) is presented in the paper. The DDFS designed with the proposed technique is ideally suited for a high clock frequency operation, requiring small lookup tables and simple multi-operand adders. Low-power operation is achieved through a power-driven synthesis, by using in the circuit two flip-flop topologies (with different power and delay performances). A test chip has been realized in 0.25 mum, 2.5 V technology. The circuit achieves a 90 dBc SFDR and operates at a maximum clock frequency of 630 MHz, with 76 mW power dissipation. By reducing the power supply at 1.8 V, a maximum operating frequency of 430 MHz was measured, with a total power dissipation as low as 24.9 mW
IEEE Transactions on Circuits and Systems | 2008
Davide De Caro; Nicola Petra; Antonio G. M. Strollo
The use of the multipartite table methods (MTMs) to implement high-performance direct digital frequency synthesizers (DDFSs) is investigated in this paper. A closed-form expressions for the spurious-free dynamic range (SFDR) is obtained when a single table of offset (TO) is used in the multipartite approximation. In this case, the optimal design that minimizes storage requirement for a given SFDR can be obtained analytically. A numerical algorithm is also presented to obtain the optimal design also when two or more TOs are employed is the approximation. The VLSI implementation results and the comparison with previously proposed DDFS architectures demonstrate the effectiveness of multipartite table methods for the realization of high performance direct digital synthesizers.
international solid state circuits conference | 2007
Davide De Caro; Nicola Petra; Antonio G. M. Strollo
The paper describes the implementation of a 380 MHz, 13 bit, direct digital synthesizer/mixer IC in 0.25mum CMOS technology. The circuit employs an innovative architecture which divides the pi/4 rotation operation required in the quadrature synthesizer/mixers, in three rotations. The first two rotations are implemented by using a CORDIC datapath completely realized in carry-save arithmetic. The directions of the CORDIC rotations are computed in parallel by using a little lookup table, for the first rotation, and a multiply by constant and addition circuit for the second rotation. The final (third) rotation is multiplier-based, in order to reduce the circuit latency and increase the circuit performances. The CORDIC datapath is implemented with a novel approach both at the algorithmic level and at the transistor level. At the algorithmic level the combined employ of sign-extension prevention, overflow prevention and a novel rounding scheme are presented. At the transistor level a design style that jointly uses full-CMOS and DPL to improve the circuit latency is described. The overall circuit performances are very interesting. The synthesizer/mixer IC, realized in a 0.25 mum CMOS technology, has an area occupation of 0.22 mm2 and dissipates 152 mW at 380 MHz with a supply voltage of 2.5 V
IEEE Transactions on Computers | 2011
Antonio G. M. Strollo; Davide De Caro; Nicola Petra
A novel technique for designing piecewise-polynomial interpolators for hardware implementation of elementary functions is investigated in this paper. In the proposed approach, the interval where the function is approximated is subdivided in equal length segments and two adjacent segments are grouped in a segment pair. Suitable constraints are then imposed between the coefficients of the two interpolating polynomials in each segment pair. This allows reducing the total number of stored coefficients. It is found that the increase in the approximation error due to constraints between polynomial coefficients can easily be overcome by increasing the fractional bits of the coefficients. Overall, compared with standard unconstrained piecewise-polynomial approximation having the same accuracy, the proposed method results in a considerable advantage in terms of the size of the lookup table needed to store polynomial coefficients. The calculus of the coefficients of constrained polynomials and the optimization of coefficients bit width is also investigated in this paper. Results for several elementary functions and target precision ranging from 12 to 42 bits are presented. The paper also presents VLSI implementation results, targeting a 90 nm CMOS technology, and using both direct and Horner architectures for constrained degree-1, degree-2, and degree-3 approximations.
IEEE Transactions on Circuits and Systems | 2011
Nicola Petra; Davide De Caro; Valeria Garofalo; Ettore Napoli; Antonio G. M. Strollo
This paper focuses on fixed-width multipliers with linear compensation function by investigating in detail the effect of coefficients quantization. New fixed-width multiplier topologies, with different accuracy versus hardware complexity trade-off, are obtained by varying the quantization scheme. Two topologies are in particular selected as the most effective ones. The first one is based on a uniform coefficient quantization, while the second topology uses a nonuniform quantization scheme. The novel fixed-width multiplier topologies exhibit better accuracy with respect to previous solutions, close to the theoretical lower bound.
IEEE Transactions on Circuits and Systems | 2009
Davide De Caro; Nicola Petra; Antonio G. M. Strollo
An high-speed special function unit (SFU) is presented in this paper. The system supports the single-precision IEEE-754 floating-point standard and implements faithfully rounded reciprocal, square root, reciprocal square root, logarithm, and exponential functions. The functions are approximated by using a novel constrained piecewise quadratic interpolation technique. In this way, the lookup table size is reduced by 40% with respect to previously proposed techniques, without any loss in accuracy. Error analysis and sizing methodology are presented in the paper. The SFU has been implemented in a 0.18-mum CMOS technology. The circuit is able to operate up to 420-MHz clock frequency, with a power dissipation of 160 mW at 420 MHz. The system can be employed in programmable graphics accelerators and in other applications where high-performance function evaluation is needed.