Tatsuo Ohtsuki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tatsuo Ohtsuki is active.

Explore More

Publication

Featured researches published by Tatsuo Ohtsuki.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1998

Maple-opt: a performance-oriented simultaneous technology mapping, placement, and global routing algorithm for FPGAs

Nozomu Togawa; Masao Yanagisawa; Tatsuo Ohtsuki

A new field programmable gate array (FPGA) design algorithm, Maple-opt, is proposed for technology mapping, placement, and global routing subject to a given upper bound of critical signal path delay. The basic procedure of Maple-opt is viewed as top-down hierarchical bipartition of a layout region. In each bipartitioning step, technology mapping onto logic blocks of FPGAs, their placement, and global routing are determined simultaneously, which leads to a more congestion-balanced layout for routing. In addition, Maple-opt is capable of estimating a lower bound of the delay for a constrained path and of extracting critical paths based on the difference between the lower bounds and given constraint values in each bipartitioning step. Two delay-reduction procedures for the critical paths are applied; routing delay reduction and logic-block delay reduction. The routing delay reduction is done by assigning each constrained path to a single subregion when bipartitioning a region. The logic-block delay reduction is done by mapping each constrained path onto a smaller number of logic blocks. Experimental results for benchmark circuits demonstrate that Maple-opt reduces the maximum number of tracks per channel by a maximum of 38% compared with existing algorithms while satisfying almost all the path delay constraints.

asia and south pacific design automation conference | 2001

Area/delay estimation for digital signal processor cores

Yuichiro Miyaoka; Yoshiharu Kataoka; Nozomu Togawa; Massao Yanagisawa; Tatsuo Ohtsuki

Hardware/software partitioning is one of the key processes in a hardware/software cosynthesis system for digital signal processor cores. In hardware/software partitioning, area and delay estimation of a processor core plays an important role since the hardware/software partitioning process must determine which part of a processor core should be realized by hardware units and which part should be realized by a sequence of instructions based on execution time of an input application program and area of a synthesized processor core. This paper proposes area and delay estimation equations for digital signal processor cores. For area estimation, we show that total area for a processor core can be derived from the sum of area for a processor kernel and area for additional hardware units. Area for processor kernel can be mainly obtained by minimum area for processor kernel and overheads for adding hardware units and registers. Area for a hardware unit can be mainly obtained by its type and operation bit width. For delay estimation, we show that critical path delay for a processor core can be derived from the delay of a hardware unit which is on the critical path in the processor core. Experimental results demonstrate that errors of area estimation are less than 2% and errors of delay estimation are less than 2ns when comparing estimated area and delay with logic-synthesized area and delay.

Journal of Circuits, Systems, and Computers | 1999

A simultaneous placement and global routing algorithm for FPGAs with power optimization

Nozomu Togawa; Kaoru Ukai; Masao Yanagisawa; Tatsuo Ohtsuki

This paper proposes a simultaneous placement and global routing algorithm for FPGAs with power optimization. The algorithm is based on hierarchical bipartitioning of layout regions and sets of logic-blocks. When bipartitioning a layout region, pseudo-blocks are introduced to preserve connections if there exist connections between bipartitioned logic-block sets. A global route is represented by a sequence of pseudo-blocks. Since pseudo-blocks and logic-blocks can be dealt with equally, placement and global routing are processed simultaneously. The algorithm gives weights to nets with high switching probabilities and attempts to assign the blocks connected by weighted nets to the same region. Thus their length is shortened and the power consumption of a whole circuit can be reduced. The experimental results demonstrate the effectiveness and efficiency of the algorithm.

international symposium on circuits and systems | 1994

A simultaneous placement and global routing algorithm for FPGAs

Nozomu Togawa; Masao Sato; Tatsuo Ohtsuki

An FPGA layout algorithm is presented, which deals with placement and global routing simultaneously by fully exploiting its regular structure. It is based on a simple and fast top-down hierarchical bi-partitioning, with placement and global routes represented by positions of logic-blocks and pseudo-blocks, respectively. Experimental results for several benchmark circuits demonstrates its efficiency and effectiveness.<<ETX>>

asia and south pacific design automation conference | 1999

A hardware/software partitioning algorithm for processor cores of digital signal processing

Nozomu Togawa; Takashi Sakurai; Masao Yanagisawa; Tatsuo Ohtsuki

A hardware/software cosynthesis system for processor cores of digital signal processing has been developed. This paper focuses on a hardware/software partitioning algorithm which is one of the key issues in the system. Given an input assembly code generated by the compiler in the system, the proposed hardware/software partitioning algorithm first determines the types and the numbers of required hardware units, such as multiple functional units, hardware loop units, and particular addressing units, for a processor core (initial resource allocation). Second, the hardware units determined at initial resource allocation are reduced one by one while the assembly code meets a given timing constraint (configuration of a processor core). The execution time of the assembly code becomes longer but the hardware costs for a processor core to execute it becomes smaller. Finally, it outputs an optimized assembly code and a processor configuration. Experimental results demonstrate that the system synthesizes processor cores effectively according to the features of an application program/data.

asia pacific conference on circuits and systems | 2000

A hardware/software partitioning algorithm for digital signal processor cores with two types of register files

Nozomu Togawa; Takashi Sakurai; Masao Yanagisawa; Tatsuo Ohtsuki

Given a compiled assembly code and a timing constraint of execution time, the proposed algorithm generates a processor core configuration with a new assembly code running on the generated processor core. The proposed algorithm considers two register files and determines the number of registers in each of the register files. Moreover the algorithm considers two or more functional units for each arithmetic or logical operation and assigns functional units with small area to a processor core without causing performance penalty. A generated processor core will have small area compared with processor cores which have a single register file or those which have only one functional unit for each operation. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.

design automation conference | 2000

An area/time optimizing algorithm in high-level synthesis for control-based hardwares

Nozomu Togawa; Masayuki Ienaga; Masao Yanagisawa; Tatsuo Ohtsuki

Given a call graph whose node corresponds to a control flow of an application program, the algorithm generates a set of state-transition graphs which represents the input call graph under area and timing constraint. In the algorithm, first state-transition graphs which satisfy only timing constraint are generated and second they are transformed so that they can satisfy area constraint. Since the algorithm is directly applied to control-flow graphs, it can deal with control flows such as bit-wise processes and conditional branches. Further, the algorithm synthesizes more than one hardware architecture candidate from a single call graph for a program. Designers of an application program can select several good hardware architectures among candidates depending on multiple design criteria. Experimental results for several control-based hardwares demonstrate effectiveness and efficiency of the algorithm.

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | 1999