Naofumi Takagi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Naofumi Takagi is active.

Explore More

Publication

Featured researches published by Naofumi Takagi.

symposium on computer arithmetic | 1995

Function evaluation by table look-up and addition

Hannes Hassler; Naofumi Takagi

We describe a general approach decomposing a function into a sum of functions, each with a smaller input site than the original. Hence we can map such functions with essentially the same precision using small ROM tables and adders. We derive an easy method to compute the worst case error for many elementary functions and an error bound for the rest. Important applications are reciprocals, logarithms, exponentials and others.<<ETX>>

IEEE Transactions on Computers | 2001

A fast algorithm for multiplicative inversion in GF(2/sup m/) using normal basis

Naofumi Takagi; Jun-ichi Yoshiki; Kazuyoshi Takagi

A fast algorithm for multiplicative inversion in GF(2/sup m/) using normal basis is proposed. It is an improvement on those proposed by Itoh and Tsujii and by Chang et al., which are based on Fermats theorem and require O(logm) multiplications. The number of multiplications is reduced by decomposing m-1 into several factors and a small remainder.

IEEE Transactions on Computers | 1998

Powering by a table look-up and a multiplication with operand modification

Naofumi Takagi

An efficient method for generating a power of an operand, i.e., X/sup p/ for an operand X and a given p, is proposed. It is applicable to ps in the form of /spl plusmn/2/sup k/, where k is any integer and of /spl plusmn/2(k/sub 1/) /spl plusmn/2(-k/sub 2/), where k/sub 1/ is any integer and k/sub 2/ is any nonnegative integer. The reciprocal, the square root, the reciprocal square root, the reciprocal square, the reciprocal cube, and so forth are included. The method is a modification of the piecewise linear approximation. A power of an operand is generated through a table look-up and a multiplication with operand modification. The same accuracy is achieved as the piecewise linear approximation. The multiplication and an addition required for the piecewise linear approximation are replaced by only one double-sized multiplication with a slight modification of the operand and, hence, one clock cycle may be reduced. The required table size is reduced because only one coefficient instead of two has to be stored.

IEEE Transactions on Computers | 2005

A hardware algorithm for modular multiplication/division

Marcelo E. Kaihara; Naofumi Takagi

A mixed radix-4/2 algorithm for modular multiplication/division suitable for VLSI implementation is proposed. The algorithm is based on Montgomery method for modular multiplication and on the extended Binary GCD algorithm for modular division. Both algorithms are modified and combined into the proposed algorithm so that almost all the hardware components are shared. The new algorithm carries out both calculations using simple operations such as shifts, additions, and subtractions. The radix-2 signed-digit representation is used to avoid carry propagation in all additions and subtractions. A modular multiplier/divider based on the algorithm performs an n-bit modular multiplication/division in O(n) clock cycles where the length of the clock cycle is constant and independent of n. The modular multiplier/divider has a linear array structure with a bit-slice feature and can be implemented with much smaller hardware than that necessary to implement both multiplier and divider separately.

IEEE Transactions on Computers | 1997

Efficient initial approximation for multiplicative division and square root by a multiplication with operand modification

Masayuki Ito; Naofumi Takagi; Shuzo Yajima

An efficient initial approximation method for multiplicative division and square root is proposed. It is a modification of the piecewise linear approximation. The multiplication and the addition required for the linear approximation are replaced by only one multiplication with a slight modification of the operand. The same accuracy is achieved. The modification of the operand requires only a bit-wise inversion and a one-bit shift, and can be implemented by a very simple circuit. One clock cycle may be saved, because the addition is removed. The required table size is also reduced, because only one coefficient instead of two has to be stored.

symposium on computer arithmetic | 1997

Generating a power of an operand by a table look-up and a multiplication

Naofumi Takagi

An efficient method for generating a power of an operand, i.e., X/sup P/ for an operand X and a given, fixed p, is proposed. The method is applicable to ps in the form of /spl plusmn/2/sup k/ where k is any integer and of /spl plusmn/2/sup k1//spl plusmn/2/sup -(k2)/ where k1 is any integer and k2 is any non negative integer. The reciprocal, the square root, and the reciprocal square root are included as special cases. It is a modification of the piecewise linear approximation based on the first order Taylor expansion. The same accuracy is achieved. A power of an operand is generated through a table lookup and a multiplication with operand modification. No addition is required. The required table size is reduced, because only one coefficient instead of two has to be stored.

IEEE Transactions on Very Large Scale Integration Systems | 2004

Systematic IEEE rounding method for high-speed floating-point multipliers

Nhon T. Quach; Naofumi Takagi; Michael J. Flynn

For performance reasons, many high-speed floating-point multipliers today precompute multiple significand values (SVs) in advance. The final normalization and rounding steps are then performed by selecting the appropriate SV. While having speed advantages, this integrated rounding method complicates the development of the rounding logic significantly, hence, requiring a systematic rounding method. The systematic rounding method, presented in this paper, has three steps: 1) constructing a rounding table; 2) developing a prediction scheme; and 3) performing rounding digits selection (RDS). The rounding table lists all possible SVs that need to be precomputed. Prediction reduces the number of these SVs for efficient hardware implementation while RDS reduces the complexity of the rounding logic. Both prediction and RDS depend on the specifics of the hardware implementation. Two hardware implementations are described. The first one is modeled after that reported by Santoro et al. and the second improved one supports all IEEE rounding modes. Besides allowing systematic hardware optimization, this rounding method has the added advantage that verification and generalization are straightforward.

Information Processing Letters | 2000

A fast addition algorithm for elliptic curve arithmetic in GF(2 n ) using projective coordinataes

Akira Higuchi; Naofumi Takagi

Abstract A new fast addition algorithm on an elliptic curve over GF(2 n ) using the projective coordinates with x=X/Z and y=Y/Z 2 is proposed.

IEEE Transactions on Applied Superconductivity | 2009

Design, Implementation and On-Chip High-Speed Test of SFQ Half-Precision Floating-Point Multiplier

Hiroshi Hara; Koji Obata; Heejoung Park; Yuki Yamanashi; Kazuhiro Taketomi; Nobuyuki Yoshikawa; Masamitsu Tanaka; Akira Fujimaki; Naofumi Takagi; Kazuyoshi Takagi; S. Nagasawa

We are developing a large-scale reconfigurable data path (LSRDP) using single-flux-quantum (SFQ) circuits as a fundamental technology that can overcome the power-consumption and memory-wall problems in CMOS microprocessors in future high-end computing systems. An SFQ LSRDP is composed of several thousands of SFQ floating-point units connected by reconfigurable SFQ network switches to achieve high performance with low power consumption. In this study, we designed and implemented an SFQ floating-point multiplier (FPM), which is one of the key components of the SFQ LSRDP. We designed a systolic-array bit-serial half-precision FPM using the 2.5 kA/cm2 Nb process. The resultant circuit area and number of Josephson junctions are 6.22 mm times 3.78 mm and 11044, respectively. The designed clock frequency is 25 GHz. We tested the circuit and confirmed the correct operation of the FPM by on-chip high-speed tests.

IEEE Transactions on Applied Superconductivity | 2009

Planarization Process for Fabricating Multi-Layer Nb Integrated Circuits Incorporating Top Active Layer

T. Satoh; Kenji Hinode; Shuichi Nagasawa; Yoshihiro Kitagawa; Mutsuo Hidaka; Nobuyuki Yoshikawa; Hiroyuki Akaike; Akira Fujimaki; Kazuyoshi Takagi; Naofumi Takagi

We have developed an advanced process for fabricating a next-generation multi-layer Nb integrated circuit structure incorporating a top active layer. In this structure, the passive-transmission-line (PTL) layer is placed between the top active layer and a DC-bias current layer at the bottom. This structure will make it possible to flexibly design active circuits and PTL wiring, and will also enable active circuits to be effectively shielded from magnetic fields generated by a large DC-bias current. Both the DC-bias current layer and the PTL layer are planarized; however, the top active layer is fabricated without planarization. To fabricate this new structure, it was necessary to achieve a better planarization process for junctions formed over underlying Nb patterns. The combined process we developed comprising additional SiO2 deposition and additional mechanical polishing after the standard Caldera planarization process results in superior planarization for junction formation. We obtained excellent characteristics of junctions formed over underlying pattern edges when they were fabricated on surfaces planarized using this new process. Using the process, we fabricated new 10-Nb-layer integrated circuit structures and estimated the characteristics of their circuit elements.

Explore More