Aiichiro Inoue | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aiichiro Inoue is active.

Explore More

Publication

Featured researches published by Aiichiro Inoue.

design automation conference | 2003

A 1.3GHz fifth generation SPARC64 microprocessor

Hisashige Ando; Yuuji Yoshida; Aiichiro Inoue; Itsumi Sugiyama; Takeo Asakawa; Kuniki Morita; Toshiyuki Muta; Tsuyoshi Motokurumada; Seishi Okada; Hideo Yamashita; Yoshihiko Satsukawa; Akihiko Konmoto; Ryouichi Yamashita; Hiroyuki Sugiyama

A 5th generation SPARC64 processor is fabricated in 130nm SOI CMOS process with 8 layers of Cu metallization. It runs at 1.3GHz with 37.4W power dissipation in the laboratory. The chip contains over 190M transistors with 19M in logic circuits. The chip size is 18.14mm x 15.99mm. The error detection and recovery mechanism is implemented for execution units and data path logic circuits in addition to on-chip arrays to detect and recover from data logic error. This processor is developed by using mostly in-house CAD tools.A fifth generation SPARC64 processor implemented in 130 nm CMOS process with 8 layers of Cu metallization operates with a 1.3 GHz clock and dissipates 34.7 W. The processor is a 4-issue out-of-order design with 2 MB on-chip level-2 cache. Error checking is added on the data-path in addition to memory. An instruction is retried for correction when an error is detected in the datapath.

high-performance computer architecture | 2003

Microarchitecture and performance analysis of a SPARC-V9 microprocessor for enterprise server systems

Mariko Sakamoto; Akira Katsuno; Aiichiro Inoue; Takeo Asakawa; Haruhiko Ueno; Kuniki Morita; Yasunori Kimura

We developed a 1.3-GHz SPARC-V9 processor: the SPARC64 V. This processor is designed to address requirements for enterprise servers and high-performance computing. Processing speed under multiuser interactive workloads is very sensitive to system balance because of the large number of memory requests included. From many years of experience with such workloads in mainframe system developments, we give importance to design a well-balanced communication structure. To accomplish this task, a system-level performance study must begin at an early please. Therefore we developed a performance model, which consists of a detailed processor model and detailed memory model, before hardware design was started. We updated it continuously. Once a logic simulator became available, we used it to verify the performance model for improving its accuracy. The model quite effectively enabled us to achieve performance goals and finish development quickly. This paper describes the SPARC64 V microarchitecture and performance analyses for hardware design.

IEICE Transactions on Electronics | 2007

A Next-Generation Enterprise Server System with Advanced Cache Coherence Chips

Mariko Sakamoto; Akira Katsuno; Go Sugizaki; Toshio Yoshida; Aiichiro Inoue; Koji Inoue; Kazuaki Murakami

Broadcast and synchronization techniques are used for cache coherence control in conventional larger scale snoop-based SMP systems. The penalty for synchronization is directly proportional to system size. Meanwhile, advances in LSI technology now enable placing a memory controller on a CPU die. The latency to access directly linked memory is drastically reduced by an on-die controller. Developing an enterprise server system with these CPUs allows us an opportunity to achieve higher performance. Though the penalty of synchronization is counted whenever a cache miss occurs, it is necessary to improve the coherence method to receive the full benefit of this effect. In this paper, we demonstrate a coherence directory organization that fits into DSM enterprise server systems. Originally, a directory-based method was adopted in high performance computing systems because of its huge scalability in comparison with snoop-based method. Though directory capacity miss and long directory access latency are the major problems of this method, the relaxed scalability requirement of enterprise servers is advantageous to us to solve these problems along with an advanced LSI technology. Our proposed directory solves both problems by implementing a full bit vector level map of the coherence directory on an LSI chip. Our experimental results validate that a system controlled by our proposed directory can surpass a snoop-based system in performance even without applying data localization optimization to an online transaction processing (OLTP) workload.

Archive | 1994