Paolo Montuschi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paolo Montuschi is active.

Explore More

Publication

Featured researches published by Paolo Montuschi.

symposium on computer arithmetic | 2007

A New Family of High.Performance Parallel Decimal Multipliers

Alvaro Vazquez; Elisardo Antelo; Paolo Montuschi

This paper introduces two novel architectures for parallel decimal multipliers. Our multipliers are based on a new algorithm for decimal carry-save multioperand addition that uses a novel BCD-4221 recoding for decimal digits. It significantly improves the area and latency of the partial product reduction tree with respect to previous proposals. We also present three schemes for fast and efficient generation of partial products in parallel. The recoding of the BCD-8421 multiplier operand into minimally redundant signed-digit radix-10, radix-4 and radix-5 representations using new recoders reduces the complexity of partial product generation. In addition, SD radix-4 and radix-5 recodings allow the reuse of a conventional parallel binary radix-4 multiplier to perform combined binary/decimal multiplications. Evaluation results show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix-8 parallel binary multipliers and other representative alternatives for decimal multiplication.

IEEE Transactions on Computers | 2010

Improved Design of High-Performance Parallel Decimal Multipliers

Alvaro Vazquez; Elisardo Antelo; Paolo Montuschi

The new generation of high-performance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multipliers. In this paper, we describe the architectures of two parallel decimal multipliers. The parallel generation of partial products is performed using signed-digit radix-10 or radix-5 recodings of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a decimal multioperand carry-save addition algorithm that uses unconventional (non BCD) decimal-coded number systems. We further detail these techniques and present the new improvements to reduce the latency of the previous designs, which include: optimized digit recoders for the generation of 2n-tuples (and 5-tuples), decimal carry-save adders (CSAs) combining different decimal-coded operands, and carry-free adders implemented by special designed bit counters. Moreover, we detail a design methodology that combines all these techniques to obtain efficient reduction trees with different area and delay trade-offs for any number of partial products generated. Evaluation results for 16-digit operands show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix--8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.

IEEE Transactions on Computers | 2015

Design and Analysis of Approximate Compressors for Multiplication

Amir Momeni; Jie Han; Paolo Montuschi; Fabrizio Lombardi

Inexact (or approximate) computing is an attractive paradigm for digital processing at nanometric scales. Inexact computing is particularly interesting for computer arithmetic designs. This paper deals with the analysis and design of two new approximate 4-2 compressors for utilization in a multiplier. These designs rely on different features of compression, such that imprecision in computation (as measured by the error rate and the so-called normalized error distance) can meet with respect to circuit-based figures of merit of a design (number of transistors, delay and power consumption). Four different schemes for utilizing the proposed approximate compressors are proposed and analyzed for a Dadda multiplier. Extensive simulation results are provided and an application of the approximate multipliers to image processing is presented. The results show that the proposed designs accomplish significant reductions in power dissipation, delay and transistor count compared to an exact design; moreover, two of the proposed multiplier designs provide excellent capabilities for image multiplication with respect to average normalized error distance and peak signal-to-noise ratio (more than 50 dB for the considered image examples).

IEEE Transactions on Computers | 1996

Carry-save multiplication schemes without final addition

Luigi Ciminiera; Paolo Montuschi

Carry-save multipliers require an adder at the last step to convert the carry-sum representation of the most significant half of the result into a non-redundant form. This paper presents n/spl times/n multiplication schemes where this conversion is performed with a circuit operating in parallel with the carry-save array. The most relevant feature of the proposed multipliers is that the full 2n-bit result is produced, unlike similar multiplication schemes presented in the literature.

IEEE Transactions on Computers | 1994

Over-redundant digit sets and the design of digit-by-digit division units

Paolo Montuschi; Luigi Ciminiera

Over-redundant digit sets are defined as those ranging from /spl minus/s to +s, with s/spl ges/B, B being the radix. This paper presents new techniques for the direct computation of division, that use an over-redundant digit set for representing the quotient, instead of simply redundant ones used previously. In particular, general criteria for synthesizing the digit selection rules and remainder updating are given for any radix and index of redundancy. A methodology combining the use of over-redundant digit sets with the prescaling of the divisor is also studied in order to achieve radix-B division units with trivial digit selection functions. It is also shown, for the specific case of radix-4 that using a prescaling slightly wider than in a radix-4 unit by M.D. Ercegovac and T. Lang (1990) possible to avoid the digit selection table. The paper also presents a modified algorithm for on-the-fly conversion of the result into the irredundant form. The proposed methodology can be considered as an alternative to existing division techniques. >

Current Medicinal Chemistry | 2013

Inhaled Muscarinic Acetylcholine Receptor Antagonists for Treatment of COPD

Paolo Montuschi; Francesco Macagno; Salvatore Valente; Leonello Fuso

Bronchodilators, generally administered via metered dose or dry powder inhalers, are the mainstays of pharmacological treatment of stable COPD. Inhaled long-acting beta-agonists (LABA) and anticholinergics are the bronchodilators primarily used in the chronic treatment of COPD. Anticholinergics act as muscarinic acetylcholine receptor antagonists and are frequently preferred over beta-agonists for their minimal cardiac stimulatory effects and greater efficacy in most studies. Their therapeutic efficacy is based on the fact that vagally mediated bronchoconstriction is the major reversible component of airflow obstruction in patients with COPD. However, bronchodilators are effective only on the reversible component of airflow obstruction, which by definition is limited, as COPD is characterized by a fixed or poorly reversible airflow obstruction. Inhaled anticholinergic antimuscarinic drugs approved for the treatment of COPD include ipratropium bromide, oxitropium bromide and tiotropium bromide. Ipratropium bromide, the prototype of anticholinergic bronchodilators, is a short-acting agent. Oxitropium bromide is administered twice a day. Tiotropium bromide, the only long-acting antimuscarinic agent (LAMA) currently approved, is administered once a day. Newer LAMAs including aclidinium bromide and glycopyrrolate bromide are currently in phase III development for treatment of COPD. Some new LAMAs, including glycocpyrrolate, are suitable for once daily administration and, unlike tiotropium, have a rapid onset of action. New LAMAs and their combination with ultra-LABA and, possibly, inhaled corticosteroids, seem to open new perspectives in the management of COPD. Dual-pharmacology muscarinic antagonist-beta2 agonist (MABA) molecules present a novel approach to the treatment of COPD by combining muscarinic antagonism and beta2 agonism in a single molecule.

IEEE Transactions on Computers | 2012

An Algorithmic and Architectural Study on Montgomery Exponentiation in RNS

Filippo Gandino; Fabrizio Lamberti; Gianluca Paravati; Jean-Claude Bajard; Paolo Montuschi

The modular exponentiation on large numbers is computationally intensive. An effective way for performing this operation consists in using Montgomery exponentiation in the Residue Number System (RNS). This paper presents an algorithmic and architectural study of such exponentiation approach. From the algorithmic point of view, new and state-of-the-art opportunities that come from the reorganization of operations and precomputations are considered. From the architectural perspective, the design opportunities offered by well-known computer arithmetic techniques are studied, with the aim of developing an efficient arithmetic cell architecture. Furthermore, since the use of efficient RNS bases with a low Hamming weight are being considered with ever more interest, four additional cell architectures specifically tailored to these bases are developed and the tradeoff between benefits and drawbacks is carefully explored. An overall comparison among all the considered algorithmic approaches and cell architectures is presented, with the aim of providing the reader with an extensive overview of the Montgomery exponentiation opportunities in RNS.

symposium on computer arithmetic | 1993

Very high radix division with selection by rounding and prescaling

Milos D. Ercegovac; Tomás Lang; Paolo Montuschi

A division algorithm in which the quotient-digit selection is performed by rounding the shifted residual in carry-save form is presented. To allow the use of this simple function, the divisor (and dividend) is prescaled to a range close to one. The implementation presented results in a fast iteration because of the use of carry-save forms and suitable recodings. The execution time is calculated, and several convenient values of the radix are selected. Comparison with other high-radix dividers is performed using the same assumptions.<<ETX>>

IEEE Transactions on Computers | 1990

Higher radix square rooting

Luigi Ciminiera; Paolo Montuschi

A general discussion on nonrestoring square root algorithms is presented, showing bounds and constraints delimiting the space of feasible algorithms, for all the choices of radix, digit set and representation of the partial remainder. Two classes of algorithms are then derived from the general discussion, and it is shown how it is possible to determine two parameters with a relevant impact on the implementation: the number of radicand bits to be inspected in order to obtain a starting value, and the number of partial remainder bits to be examined for digit selection. The algorithms for the specific case of radix 4 digit set (-2, -1, 0, +1, +2), and partial remainder represented in carry-save form are derived in order to show that the algorithms introduced can lead to better results than those obtained with algorithms previously presented. >

IEEE Transactions on Software Engineering | 1990

Some properties of timed token medium access protocols

Adriano Valenzano; Paolo Montuschi; Luigi Ciminiera

Timed-token protocols are used to handle, on the same local area network, both real-time and non-real-time traffic. The authors analyze this type of protocol, giving worst-case values for the throughput of non-real-time traffic and the average token rotation time. Results are obtained for synchronous traffic generated according to a generic periodic pattern under heavy conditions for non-real-time traffic and express not only theoretical lower bounds but values deriving from the analysis of some real networks. A model which addresses the asynchronous overrun problem is presented. The influence of introducing multiple priority classes for non-real-time traffic on the total throughput of this type of message is shown. It is also shown that the differences between the values obtained under worst-case assumptions are close to those obtained under best-case assumptions; the method may therefore be used to provide important guidelines in properly tuning timed-token protocol parameters for each specific network installation. >

Explore More