Masaitsu Nakajima | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Masaitsu Nakajima is active.

Explore More

Publication

Featured researches published by Masaitsu Nakajima.

IEEE Journal of Solid-state Circuits | 1987

A 32-bit CMOS microprocessor with on-chip cache and TLB

Hiroshi Kadota; Jiro Miyake; I. Okabayashi; T. Maeda; Tadashi Okamoto; Masaitsu Nakajima; K. Kagawa

A 32-b general-purpose microprocessor has been developed using 1-/spl mu/m CMOS technology. The chip, containing 372 K transistors, operates at a 80-ns machine cycle time with a 5-V power supply. For virtual and hierarchical memory system support, a 64-entry full-associative translation lookaside buffer (TLB) and a 2-kbyte instruction cache are implemented on the chip. The internal access times for the TLB and cache are 22 and 18 ns, respectively. The microarchitecture has been designed to reduce the pipeline to three stages, simplifying the control path and obtaining high-speed performance. The data path of this chip is also enhanced with hardware, such as a barrel shifter and multiplier/divider. The chip performance has been measured to be 5.1 million instructions per second (MIPS) and 50-ns-access main memory.

IEEE Journal of Solid-state Circuits | 1989

A VLSI RISC with 20-MFLOPS peak, 64-bit floating-point unit

Katsuyuki Kaneko; Tadashi Okamoto; Masaitsu Nakajima; Yasuhiro Nakakura; Satoshi Gokita; Junji Nishikawa; Yuji Tanikawa; Hiroshi Kadota

A microprocessor designed as a processing element of a scientific parallel computer system is described. This chip consists of a simple integer processor core and dedicated floating-point hardware and executes 64-bit floating-point addition, subtraction, and multiplication at a rate of every 50 ns and division every 350 ns. The processor, which employs RISC architecture and Harvard-style bus organization, executes most of the 47 instructions in one 50-ns cycle. The chip is fabricated in 1.2- mu m n-well CMOS technology, containing 440K transistors in a 14.4*13.5-mm/sup 2/ die. The authors provide an overview of the processor, especially focusing on the functions for a parallel system, floating-point hardware, and the new divide algorithm. >

custom integrated circuits conference | 1991

A 80 MFLOPS 64-bit microprocessor for parallel computer

Hiraku Nakano; Masaitsu Nakajima; Yasuhiro Nakakura; Tadahiro Yoshida; Yoshiyuki Goi; Yuji Nakai; Reiji Segawa; Takeshi Kishida; Hiroshi Kadota

A 80-MFLOPS 64-bit microprocessor is described that employs superscalar architecture to execute two instructions, including the combination of 64-bit floating-point add and multiply instructions, in one 25-ns cycle simultaneously. The processor, implemented in a 0.8- mu m CMOS technology, contains 1300 K transistors. The processor also employs a RISC (reduced instruction set computer) architecture and Harvard-style bus organization. Division is accomplished every 200 ns. A typical performance is 64 MFLOPS.<<ETX>>

international symposium on microarchitecture | 1990

Processing element design for a parallel computer

Katsuyuki Kaneko; Masaitsu Nakajima; Yasuhiro Kakakura; Junji Nishikawa; Ichiro Okabayashi; Hiroshi Kadota

A study has been made of how cost-effectiveness due to the improvement of VLSI technology can apply to a scientific computer system without performance loss. The result is a parallel computer, ADENA (Alternating Direction Edition Nexus Array), with a core consisting of four kinds of VLSI chips, two for processor elements (PES) and two for the interprocessor network (plus some memory chips). An overview of ADENA and an analysis of its performance are given. The design considerations for the PEs incorporated in ADENA are discussed. The factors that limit performance in a parallel processing environment are analyzed, and the measures employed to improve these factors at the LSI design level are described. The 42.6 sq cm CMOS PEs reach a peak performance of 20 MFLOPS and a 256-PE ADENA 1.5 GFLOPS has been achieved and 300 to 400 MFLOPS for PDE applications.<<ETX>>

IEEE Journal of Solid-state Circuits | 1992

An 80-MFLOPS (peak) 64-b microprocessor for parallel computer

Hiraku Nakano; Masaitsu Nakajima; Yasuhiro Nakakura; Tadahiro Yoshida; Yoshiyuki Goi; Yuji Nakai; Reiji Segawa; Takeshi Kishida; Hiroshi Kadota

An 80-MFLOPS (peak) 64-b microprocessor that employs superscalar architecture to execute two instructions simultaneously in one 25-ns cycle, including the combination of 64-b floating-point add and multiply instructions, is described. The processor implemented in a 0.8- mu m CMOS technology contains 1300 K transistors. The processor also employs a RISC architecture and Harvard-style bus organization. The authors provide an overview of the processor, especially focusing on processor architecture, floating-point hardware, and performance. >

international conference on supercomputing | 1991

Parallel computer ADENART—its architecture and application

Hiroshi Kadota; Katsuyuki Kaneko; Ichiro Okabayashi; Tadashi Okamoto; T. Mimura; Yasuhiro Nakakura; Akiyoshi Wakatani; Masaitsu Nakajima; Junji Nishikawa; Koji Zaiki; Tatsuo Nogi

A new parallel computer, ADENART (previously it was called ADENA,) for numerical applications has been developed. It is composed of 256 processing elements (:PEs) and interconnection networtcHXnet.) Each PE consists of a dedicated floating-point processor VLSI whose sustained performance is 10 MFLOPS, a communication controller VLSI and locally-distributed memories. The peak performance of the system is, therefore, 2.56GFLOPS. HXnet supports two types of efficient data-transfer modes; FAST mode and SLOW mode. Both of them are useful for various applications. The practical performance of ADENART system has been evaluated by several application programs. In partial differential equation solver, the system performance was measured as 475 MFLOPS.

symposium on vlsi circuits | 2007

Homogenous Dual-Processor core with Shared L1 Cache for Mobile Multimedia SoC

Masaitsu Nakajima; Takao Yamamoto; Masayuki Yamasaki; Keisuke Kaneko; Tetsu Hosoki

We propose a novel dual-processor core which adopts a shared L1 cache with active way scheme. In this scheme, each WAY of cache is owned by specific processor and replace operation is only happened to its own WAYs. This architecture only requires dual port TAG, and no dual port DATA memory to realize simultaneous access from both processors, and can guarantee no cache thrashing and no snoop overhead. And also by sharing cache memory and cache controller, power dissipation is 23% smaller in case of heavy load and area is 29 % smaller than dual processor core with snoop cache.

international solid-state circuits conference | 1989

A 64 b RISC microprocessor for a parallel computer system

K. Kaneko; T. Okamoto; Masaitsu Nakajima; Y. Nakakura; S. Gokita; J. Nishikawa; Y. Tanikawa; Hiroshi Kadota

A description is given of a microprocessor that is designed as a processing element (PE) of a parallel computer system, executing a 64-b floating-point ADD/SUB/MULT in 50 ns and a DIV in 350 ns because of its pipelined structure and dedicated floating-point blocks. The processor employs RISC (reduced-instruction-set-computer) architecture and executes most of its 47 instructions in one 50-ns cycle. The chip is fabricated in 1.2- mu m n-well CMOS technology and contains 440 K transistors in a 14.4*13.5-mm/sup 2/ die. The processor provides high-speed double-precision floating-point operation, high reliability in data handling, communication capability between PEs and the host controller device, and hardware support for efficient code generation by the compiler. The maximum performance of the processor is 20 MFLOPS (million floating-point operations per second) or 20 MIPS (million instructions per second). Typical performance is 4 MFLOPS, measured during execution of Gaussian elimination operation. The major characteristics and performance of the processor are summarized.<<ETX>>

symposium on vlsi circuits | 2017

System architecture with single chip 8K HEVC decoder for 8K advanced BS receiver system

Masaitsu Nakajima; Daisuke Murakami; Hironori Kubo; Takahide Baba; Yoichiro Miki

To implement 8K Advanced BS receiver system, 8K HEVC decoder SoC is developed as key component. To solve the exceeded required memory bandwidth over physical memory bandwidth limitation issue for realizing 8K decoder, two types of multi-cast write back scheme, including reference data multi-cast write back and output data multi-cast write back, are introduced. 8K HEVC decoder chip is fabricated in 28nm CMOS technology and SIP packaged with eight DDR3 memories.

international solid-state circuits conference | 2011

Session 18 overview / technology directions: Organic innovations

Chris Van Hoof; Masaitsu Nakajima

Summary form only given. Implementing transistors, circuits and sensors on arbitrary substrates has great potential for a wide range of low-cost electronic products such as flexible displays, disposable biochemical sensors and large-area artificial skin. This vision is gradually becoming a reality thanks to continuing innovations in low-temperature-processed organic thin-film transistor (OTFT) technology.

Explore More