Masao Nakaya | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Masao Nakaya is active.

Explore More

Publication

Featured researches published by Masao Nakaya.

IEEE Journal of Solid-state Circuits | 2000

14-bit 2.2-MS/s sigma-delta ADC's

James C. Morizio; I.M. Hoke; T. Kocak; C. Geddie; C. Hughes; J. Perry; S. Madhavapeddi; M.H. Hood; G. Lynch; Harufusa Kondoh; T. Kumamoto; T. Okuda; H. Noda; M. Ishiwaki; T. Miki; Masao Nakaya

This paper presents the design and test results of a fourth-order and sixth-order 14-bit 2.2-MS/s sigma-delta analog-to-digital converter (ADC). The analog modulator and digital decimator sections were implemented in a 0.35 /spl mu/m CMOS double-poly triple-level metal 3.3-V process. The design objective for these ADCs was to achieve 85 dB signal-to-noise distortion ratio (SNDR) with less than 200 mW power dissipation. Both modulators employ a cascade sigma-delta topology. The fourth-order modulator consists of two cascaded second-order stages which include 1-bit and 5-bit quantizers, respectively. The sixth-order modulator has a 2-2-2 cascade structure and 1-bit quantizer at the end of each stage. An oversampling ratio of 24 was selected to give the best SNDR and power consumption with realizable gain-matching requirements between the analog and digital sections.

international symposium on computer architecture | 1995

Unconstrained speculative execution with predicated state buffering

Hideki Ando; Chikako Nakanishi; Tetsuya Hara; Masao Nakaya

Speculative execution is execution of instructions before it is known whether these instructions should be executed. Compiler-based speculative execution has the potential to achieve both a high instruction per cycle rate and high clock rate. Pure compiler-based approaches, however have greatly limited instruction scheduling due to a limited ability to handle side effects of speculative execution. Significant performance improvement is, thus, difficult in non-numerical applications. This paper proposes a new architectural mechanism, called predicating, which provides unconstrained speculative execution. Predicating removes restrictions which limit the compilers ability to schedule instructions. Through our hardware support, the compiler is allowed to move instructions past multiple basic block boundaries from any succeeding control path. Predicating buffers the side effects of speculative execution with its predicate, and the buffered predicate efficiently commits or squashes the side effects. The mechanism also provides a speculative exception handling scheme. The scheme, called the future condition properly postpones speculative exceptions and efficiently restarts the process. We show that our mechanism can be implemented through a modest amount of hardware with little complexity. The evaluation results show that our mechanism significantly improves performane, and achieves a 2.45x speedup over scalar machines.

IEEE Journal of Solid-state Circuits | 1993

A 622-Mb/s 8*8 ATM switch chip set with shared multibuffer architecture

Harufusa Kondoh; Hiromi Notani; Hideaki Yamanaka; Keiichi Higashitani; Hirotaka Saito; Isamu Hayashi; Shigeki Kohama; Yoshio Matsuda; Kazuyoshi Oshima; Masao Nakaya

An asynchronous transfer mode (ATM) switch chip set, which employs a shared multibuffer architecture, and its control method are described. This switch architecture features multiple-buffer memories located between two crosspoint switches. By controlling the input-side crosspoint switch so as to equalize the number of stored ATM cells in each buffer memory, these buffer memories can be treated as a single large shared buffer memory. Thus, buffers are used efficiently and the cell loss ratio is reduced to a minimum. Furthermore, no multiplexing or demultiplexing is required to store and restore the ATM cells by virtue of parallel access to the buffer memories via the crosspoint switches. Access time for the buffer memory is thus greatly reduced. This feature enables high-speed switch operation. A three-VLSI chip set using 0.8- mu m BiCMOS process technology has been developed. Four aligner LSIs, nine bit-sliced buffer-switch LSIs, and one control LSI are combined to create a 622-Mb/s 8*8 ATM switching system that operates at 78 MHz. In the switch fabric, 155-Mb/s ATM cells can also be switched on the 622-Mb/s port using time-division multiplexing. >

international symposium on computer architecture | 1996

Performance Comparison of ILP Machines with Cycle Time Evaluation

Hideki Ando; Masao Nakaya; Chikako Nakanishi; Tetsuya Hara

Many studies have investigated performance improvement through exploiting instruction-level parallelism (ILP) with a particular architecture. Unfortunately, these studies indicate performance improvement using the number of cycles that are required to execute a program, but do not quantitatively estimate the penalty imposed on the cycle time from the architecture. Since the performance of a microprocessor must be measured by its execution time, a cycle time evaluation is required as well as a cycle count speedup evaluation. Currently, superscalar machines are widely accepted as the machines which achieve the highest performance. On the other hand, because of hardware simplicity and instruction scheduling sophistication, there is a perception that the next generation of microprocessors will be implemented with a VLIW architecture. A simple VLIW machine, however, has a serious weakness regarding speculative execution. Thus, it is a question whether a simple VLIW machine really outperforms a superscalar machine. We recently proposed a mechanism called predicating that supports speculative execution for the VLIW machine, and showed a significant cycle count speedup over a scalar machine. Although the mechanism is simple, it is unknown how much it imposes a penalty on the cycle time, and how much the performance is improved as a result. This paper evaluates both the cycle count speedup and the cycle time for three ILP machines: a superscalar machine, a simple VLIW machine, and the VLIW machine with predicating. The evaluation results show that the simple VLIW machine slightly outperforms the superscalar machine, while the VLIW machine with predicating achieves a significant speedup of 1.41x over the superscalar machine.

symposium on vlsi circuits | 1992

An 8*8 ATM switch LSI with shared multi-buffer architecture

Hiromi Notani; Harufusa Kondoh; Isamu Hayashi; Hideaki Yamanaka; Hirotaka Saito; Yoshio Matsuda; Masao Nakaya

An ATM switch LSI with a shared multibuffer architecture is proposed. With this architecture, a fourfold speed improvement is achieved in accessing buffer memories as compared to conventional shared-buffer-type switches, and high buffer memory utilization efficiency is also realized. This switch LSI is designed to operate at 100 MHz, using 0.8- mu m BiCMOS technology. Eight switch LSIs at 78-MHz operation construct a 622-Mb/s 8*8 ATM switching system with a buffer size of 8*128 ATM cells.<<ETX>>

international solid-state circuits conference | 1981

A bipolar 2500-gate subnanosecond masterslice LSI

Masao Nakaya; S. Kato; K. Tsukamoto; H. Sakurai; T. Kondo; Yasutaka Horiba

A bipolar 2500-gate subnanosecond masterslice LSI has been developed for use in computer mainframes. A walled-emitter structure has been realized by using double boron ion implantation with an n-type epitaxial layer to obtain high performance and high packing density. A new cell composed of a pair of adjacent gates provides high utilization of input transistors. A gate delay of 0.58 ns with power dissipation of 0.54 mW/gate has been achieved. The masterslice has been applied to an 18-bit memory data register circuit consisting of 1983 internal logic gates and has been mounted on a new 224-pin plug-in package.

IEEE Journal of Solid-state Circuits | 1978

A 920 gate DSA MOS masterslice

Osamu Tomisawa; Kenji Anami; Masao Nakaya; Masashi Ohmori; I. Ohkura; Takao Nakano

A DSA MOS (diffusion self-aligned MOS) masterslice circuit with up to 920 gates and a delay of 3 ns per gate has been developed for random logic computer circuits, utilizing the performance and economical advantages of the LSI masterslice approach. To attain high packing density and high speed with conventional design rules, the DSA MOSFET technology has been used for the basic device. The chip comprises 50 by 16 gate cells and 116 input/output buffers. This LSI chip is two to three times better than bipolar S-TTL in packing density and is comparable in propagation delay time. As an example of an LSI device obtained through customized metallization, an 8 bit ALU is described which has an average delay time of 3 ns and a power dissipation of 3 W.

international solid-state circuits conference | 1978

A 920 gate masterslice

T. Nakano; O. Tomisawa; Kenji Anami; Masao Nakaya; M. Ohmori; I. Ohkura

An MOS masterslice chip with up to 920 gates, 3W dissipation and 3ns/gate propagation delay time for random logic LSIs will be reported.

IEEE Journal of Solid-state Circuits | 1979

A multilevel metallized DSA MOS masterslice

I. Ohkura; Osamu Tomisawa; Masao Nakaya; Y. Ohbayashi; T. Nakano

A new LSI with high-speed capability and high-packing density for computer use has been successfully achieved within a short turnaround time by a new DSA MOS masterslice. Two-level metallization has been accomplished by the use of full plasma processes. The average gate delay time of the new masterslice was improved to 2 ns compared with 3 ns in the case of single-level metallization.

international conference on computer design | 1993

Speculative execution and reducing branch penalty in a parallel issue machine

Hideki Ando; Chikako Nakanishi; Hirohisa Machida; Tetsuya Hara; Satoru Kishida; Masao Nakaya

Parallel instruction issue is essential for performance improvement in current microprocessor designs. Just extra function units are, however, little beneficial in non-numerical applications since control dependence severely limits exploitation of instruction-level parallelism (ILP) and frequent branches consume ILP due to its long latency. Boosting is an interesting technique to reduce control dependence. It allows general speculative execution with little cycle time penalty. From the cost/performance point of view, we propose the efficient implementation of boosting, which requires the small support hardware and maximizes performance gain from boosting in the limited hardware. We also propose a new branch scheme to reduce the branch penalty which has a particularly big performance impact in a parallel issue machine. Our scheme fetches from both directions of the branch with small hardware cost through integration of a code movement and hardware support. We evaluate our schemes and find that they significantly contribute to performance improvement.<<ETX>>

Explore More