Masaaki Kondo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Masaaki Kondo is active.

Explore More

Publication

Featured researches published by Masaaki Kondo.

high-performance computer architecture | 2005

A small, fast and low-power register file by bit-partitioning

Masaaki Kondo; Hiroshi Nakamura

A large multi-ported register file is indispensable for exploiting instruction level parallelism (ILP) in todays dynamically scheduled superscalar processors. The number of ports and the size of the register file must be enlarged as the issue width and instruction window size increase. However, a larger register file causes longer access delays and more power consumption. To tackle these problems, we propose bit-partitioned register file which reduces the area, access time, and energy consumption of the register file. The proposed method relies on the fact that many operands do not need the full-bit width (typically a 32-bit or 64-bit width) of a register entry. Because the effective bit-width of most register operands is narrower than the full-bit width of a register entry, the upper bits of the register entries assigned to such narrow-width operands are useless. Thus, we propose to use of these useless upper bits for other operands by partitioning the register entries. In this paper, we show the mechanism of the proposed register file and evaluate its performance and power consumption. The evaluation results reveal that the proposed register file achieves higher instruction per cycle (IPC) in a smaller physical area, and consequently with shorter access time and less power consumption.

international conference on computer design | 2008

A fine-grain dynamic sleep control scheme in MIPS R3000

Naomi Seki; Lei Zhao; Jo Kei; Daisuke Ikebuchi; Yu Kojima; Yohei Hasegawa; Hideharu Amano; Toshihiro Kashima; Seidai Takeda; Toshiaki Shirai; Mitustaka Nakata; Kimiyoshi Usami; Tetsuya Sunata; Jun Kanai; Mitaro Namiki; Masaaki Kondo; Hiroshi Nakamura

A fine-grain dynamic power gating is proposed for saving the leakage power in MIPS R3000 by sleep control and applied to a processor pipeline. An execution unit is divided into four small units: multiplier, divider, shifter and other (CLU). The power of each unit is cut off dynamically, based on the operation. We tape-outed the prototype chip Geyser-0, which provides an R3000 Core with the power reduction technique, 16 KB caches and translation lookaside buffer (TLB) using 90 nm CMOS technology. The evaluation results of four benchmark programs for embedded applications show that 47% of the leakage power is reduced on average with 41% area overhead.

ieee international conference on high performance computing data and analytics | 2015

Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing

Yuichi Inadomi; Tapasya Patki; Koji Inoue; Mutsumi Aoyagi; Barry Rountree; Martin Schulz; David K. Lowenthal; Yasutaka Wada; Keiichiro Fukazawa; Masatsugu Ueda; Masaaki Kondo; Ikuo Miyoshi

A key challenge in next-generation supercomputing is to effectively schedule limited power resources. Modern processors suffer from increasingly large power variations due to the chip manufacturing process. These variations lead to power inhomogeneity in current systems and manifest into performance inhomogeneity in power constrained environments, drastically limiting supercomputing performance. We present a first-of-its-kind study on manufacturing variability on four production HPC systems spanning four microarchitectures, analyze its impact on HPC applications, and propose a novel variation-aware power budgeting scheme to maximize effective application performance. Our low-cost and scalable budgeting algorithm strives to achieve performance homogeneity under a power constraint by deriving application-specific, module-level power allocations. Experimental results using a 1,920 socket system show up to 5.4X speedup, with an average speedup of 1.8X across all benchmarks when compared to a variation-unaware power allocation scheme.

international symposium on microarchitecture | 2011

Cool Mega-Arrays: Ultralow-Power Reconfigurable Accelerator Chips

Nobuaki Ozaki; Yoshihiro Yasuda; Yoshiki Saito; Daisuke Ikebuchi; Masayuki Kimura; Hideharu Amano; Hiroshi Nakamura; Kimiyoshi Usami; Mitaro Namiki; Masaaki Kondo

Cool Mega-Array (CMA) is an energy-efficient reconfigurable accelerator for battery-driven mobile devices. It has a large processing-element array without memory elements for mapping an applications data-flow graph, a simple programmable microcontroller for data management, and data memory. Unlike coarse-grained dynamically reconfigurable processors, CMA reduces power consumption by switching hardware context and storing intermediate data in registers.

asia and south pacific design automation conference | 2010

Geyser-1: a MIPS R3000 CPU core with fine-grained run-time power gating

Daisuke Ikebuchi; Naomi Seki; Yu Kojima; M. Kamata; Lei Zhao; Hideharu Amano; Toshiaki Shirai; Satoshi Koyama; Tatsunori Hashida; Y. Umahashi; Hiroki Masuda; Kimiyoshi Usami; Seidai Takeda; Hiroshi Nakamura; Mitaro Namiki; Masaaki Kondo

Geyser-1 is a MIPS CPU which provides a fine-grained run-time power gating (PG) controlled by instructions. Unlike traditional PGs, it uses special standard cells in which the virtual ground (VGND) is separated from the real ground, and a certain number of the sleep transistors are inserted for quick power shut-down and wake-up. In Geyser-1, the fine-grained run-time PG is applied to computational modules in the execution stage. The power shut-down and wakeup are controlled with architectural and software level. This implementation is the first available CPU with this type of run-time PG technique. Geyser-1 has both time and spatial fine-grained PG and works well with a real chip.

asian solid state circuits conference | 2009

Geyser-1: A MIPS R3000 CPU core with fine grain runtime power gating

Geyser-1, a prototype MIPS R3000 CPU with fine grain runtime PG for major computational components in the execution stage is available. Function units such as CLU, shifter, multiplier and divider are power-gated and controlled at runtime such that only the function unit to be used is powered-on to minimize the leakage power. The evaluation results on the real chip reveals that the fine grain runtime PG mechanism works without electric problems. It reduces the leakage power 7% at 25 °C and 24% at 80°C. The evaluation results using benchmark programs show that the power consumption can be reduced from 3% at 25 °C and 30% at 80°C.

international conference on vlsi design | 2009

Design and Implementation of Fine-Grain Power Gating with Ground Bounce Suppression

Kimiyoshi Usami; Toshiaki Shirai; Tatsunori Hashida; Hiroki Masuda; Seidai Takeda; Mitsutaka Nakata; Naomi Seki; Hideharu Amano; Mitaro Namiki; Masashi Imai; Masaaki Kondo; Hiroshi Nakamura

This paper describes a design and implementation methodology for fine-grain power gating. Since sleep-in and wakeup are controlled in a fine granularity in run time, shortening the transition time between the sleep and active states is strongly required. In particular, shortening the wakeup time is essential because it affects the execution time and hence does the performance. However, this requirement makes suppression of the ground-bounce more difficult. We propose a novel technique to skew the wakeup timings of fine-grain local power domains to suppress the ground bounce. Delay of buffers driving power switches is skewed in the buffer tree by selectively downsizing them. We designed a MIPS R3000 based CPU core in a 90nm CMOS technology and applied our technique to internal function units. Simulation results showed that our technique reduces the rush current to 47% over the case to turn-on the power switches simultaneously. This resulted in suppressing the ground bounce to 53mV with 3.3ns wakeup time. Simulation results from running benchmark programs showed that the total power dissipation for the function units was reduced by up to 15% at 25°C and by 62% at 100°C. Effectiveness in power savings is discussed from the viewpoint of the temperature-dependent break-even points and the consecutive idle time in the program.

computing frontiers | 2007

An intra-task dvfs technique based on statistical analysis of hardware events

Hiroshi Sasaki; Yoshimichi Ikeda; Masaaki Kondo; Hiroshi Nakamura

The importance and demand for various types of optimization techniques for program execution is growing rapidly. In particular, dynamic optimization techniques are regarded as important. Although conventional techniques usually generated an execution model for dynamic optimization by qualitatively analyzing the behaviors of computer systems in a knowledge-based manner, the proposed technique generates models by statistically analyzing the behaviors from quantitative data of hardware events. In the present paper, a novel dynamic voltage and frequency scaling (DVFS) method based on statistical analysis is proposed. The proposed technique is a hybrid technique in which static information, such as the breakpoint of program phases and, dynamic information, such as the number of cache misses given by the performance counter, are used together. Relationships between the performance and values of performance counters are learned statistically in advance. The compiler then inserts a run-time code for predicting the performance and setting the appropriate frequency/voltage depending on the predicted performance. The proposed technique can greatly reduce the energy consumption while satisfying soft timing constraints.

ACM Sigarch Computer Architecture News | 2007

Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS

Masaaki Kondo; Hiroshi Sasaki; Hiroshi Nakamura

Recently, a single chip multiprocessor (CMP) is becoming an attractive architecture for improving throughput of program execution. In CMPs, multiple processor cores share several hardware resources such as cache memory and memory bus. Therefore, the resource contention significantly degrades performance of each thread and also loses fairness between threads. In this paper, we propose a Dynamic Frequency and Voltage Scaling (DVFS) algorithm for improving total instruction throughput, fairness, and energy efficiency of CMPs. The proposed technique periodically observes the utilization ratio of shared resources and controls the frequency and the voltage of each processor core individually to balance the ratio between threads. We evaluate our technique and the evaluation results show that fairness between threads are greatly improved by the technique. Moreover, the total instruction throughput increases in many cases while reducing energy consumption.

design, automation, and test in europe | 2007

Task Scheduling under Performance Constraints for Reducing the Energy Consumption of the GALS Multi-Processor SoC

Ryo Watanabe; Masaaki Kondo; Masashi Imai; Hiroshi Nakamura; Takashi Nanya

The present paper focuses on applications that are periodic and have both latency and throughput constraints. For these applications, pipeline scheduling is effective for reducing energy consumption. Thus, the present paper proposes a pipelined task scheduling method for minimizing the energy consumption of GALS MP-SoC under latency and throughput constraints. First, we model target GALS MP-SoC architecture and application tasks. We then show that the energy optimization problem under this model belongs to the class of mixed-integer linear programming. Next, we propose a new scheduling method based on simulated annealing for the purpose of solving this problem quickly. Finally, experimental results demonstrate that the proposed method achieves a significant energy reduction on a real application under a practical architecture

Explore More