Kimiyoshi Usami
Shibaura Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kimiyoshi Usami.
international solid-state circuits conference | 1998
Masafumi Takahashi; Mototsugu Hamada; Tsuyoshi Nishikawa; Hideho Arakida; Yoshiro Tsuboi; Tetsuya Fujita; Fumitoshi Hatori; Shinji Mita; Kojiro Suzuki; Akihiko Chiba; Toshihiro Terazawa; Fumihiko Sano; Y. Watanabe; Hiroshi Momose; Kimiyoshi Usami; Mutsunori Igarashi; Takashi Ishikawa; Masahiro Kanazawa; Tadahiro Kuroda; Tohru Furuyama
This MPEG4 video codec implements essential functions in the MPEG4 committee draft. It consumes 60 mW at 30 MHz, 30% of the power dissipation of a conventional CMOS design. Measured power dissipation is summarized. 70% power reduction is achieved by low-power techniques at circuit and architectural levels. A 16b RISC processor provides software programmability. Binary shape decoding uses 20% of the computation power of the RISC processor at 30MHz clock, with negligible increase in chip power dissipation. Three-step hierarchical motion estimation reduces power dissipation.
custom integrated circuits conference | 1998
Mototsugu Hamada; Masafumi Takahashi; Hideho Arakida; Akihiko Chiba; Toshihiro Terazawa; Takashi Ishikawa; Masahiro Kanazawa; Mutsunori Igarashi; Kimiyoshi Usami; Tadahiro Kuroda
A novel design technique which combines a variable supply-voltage scheme and a clustered voltage scaling is presented (VS-CVS scheme). A theory to choose the optimum supply voltages in the VS-CVS scheme is discussed which enables us to perform chip design in a top-down fashion. Level-shifting flip-flops are developed which reduce power, delay and area penalties significantly. Application of this technique to an MPEG4 video codec saves 55% of the power dissipation without degrading circuit performance compared to a conventional CMOS design.
custom integrated circuits conference | 1997
Kimiyoshi Usami; Kazutaka Nogami; Mutsunori Igarashi; Fumihiro Minami; Yukio Kawasaki; Takashi Ishikawa; Masahiro Kanazawa; Takahiro Aoki; Midori Takano; Chiharu Mizuno; Makoto Ichida; Shinji Sonoda; Makoto Takahashi; Naoyuki Hatanaka
This paper describes an automated design technique to reduce power by making use of two supply voltages. The technique consists of structure synthesis, placement and routing. The structure synthesizer clusters the gates off the critical paths so as to supply the reduced voltage to save power. The placement and routing tool assigns either the reduced voltage or the unreduced one to each row so as to minimize the area overhead. Combining these techniques together, we applied it to the random logic modules of a media processor chip. The combined technique reduced the power by 47% on average with an area overhead of 15% at the random logic, while keeping the performance,.
international conference on computer design | 2006
Kimiyoshi Usami; Naoaki Ohkubo
Leakage power dissipation becomes a dominant component in operation power in nanometer devices. This paper describes a design methodology to implement runtime power gating in a fine-grained manner. We propose an approach to use sleep signals that are not off-chip but are extracted locally within the design. By utilizing enable signals in a gated clock design, we automatically partition the design into domains. We then choose the domains that will achieve the gain in energy savings by considering dynamic energy overhead due to turning on/off power switches. To help this decision we derive analytical formulas that estimate the break-even point. For the domains chosen, we create power gating structure by adding power switches and generating control logic to the switches. We experimentally built a design flow and evaluated with a synthesizable RTL code for a microprocessor and a 90nm CMOS device model both used in industry. Results from applying to a datapath showed that the break-even point that achieves the gain exists in the number of enables controlling the power switch. By applying the domains controlled by up to 3 enables achieved the active leakage savings by 83% at the area penalty by 20%.
design automation conference | 1998
Kimiyoshi Usami; Mutsunori Igarashi; Takashi Ishikawa; Masahiro Kanazawa; Masafumi Takahashi; Mototsugu Hamada; Hideho Arakida; Toshihiro Terazawa; Tadahiro Kuroda
This paper describes a fully automated low-power design methodology in which three different voltage-scaling techniques are combined together. Supply voltage is scaled globally, selectively, and adaptively while keeping the performance. This methodology enabled us to design an MPEG4 codec core with 58% less power than the original in three week turn-around-time.
asia and south pacific design automation conference | 2000
Kimiyoshi Usami; Mutsunori Igarashi
This paper describes a gate-level power minimization methodology using dual supply voltages. Gates and flip-flops off the critical paths are made to operate at the reduced supply voltage to save power. Core technologies are dual-V/sub DD/ circuit synthesis and P&R. We give a brief overview on existing low-power EDA technologies as background and discuss advantages and challenges of the dual-V/sub DD/ approach. Through real design examples, we will show that the approach reduces power effectively while keeping the performance at negligible area overhead.
networks on chips | 2010
Hiroki Matsutani; Michihiro Koibuchi; Daisuke Ikebuchi; Kimiyoshi Usami; Hiroshi Nakamura; Hideharu Amano
This paper proposes an ultra fine-grained run-time power gating of on-chip router, in which power supply to each router component (e.g., VC queue, crossbar MUX, and output latch) can be individually controlled in response to the applied workload.As only the router components which are just transferring a packet are activated, the leakage power of the on-chip network can be reduced to the near-optimal level.However, a certain amount of wakeup latency is required to activate the sleeping components, and the application performance will be degraded.In this paper, we estimate the wakeup latency for each component based on circuit simulations using a 65nm process.Then we propose four early wakeup methods to overcome the wakeup latency.The proposed router with the early wakeup methods is evaluated in terms of the application performance, area, and leakage power.As a result, it reduces the leakage power by 78.9%, at the expense of the 4.3% area and 4.0% performance when we assume a 1GHz operation.
international conference on computer design | 2008
Naomi Seki; Lei Zhao; Jo Kei; Daisuke Ikebuchi; Yu Kojima; Yohei Hasegawa; Hideharu Amano; Toshihiro Kashima; Seidai Takeda; Toshiaki Shirai; Mitustaka Nakata; Kimiyoshi Usami; Tetsuya Sunata; Jun Kanai; Mitaro Namiki; Masaaki Kondo; Hiroshi Nakamura
A fine-grain dynamic power gating is proposed for saving the leakage power in MIPS R3000 by sleep control and applied to a processor pipeline. An execution unit is divided into four small units: multiplier, divider, shifter and other (CLU). The power of each unit is cut off dynamically, based on the operation. We tape-outed the prototype chip Geyser-0, which provides an R3000 Core with the power reduction technique, 16 KB caches and translation lookaside buffer (TLB) using 90 nm CMOS technology. The evaluation results of four benchmark programs for embedded applications show that 47% of the leakage power is reduced on average with 41% area overhead.
international symposium on microarchitecture | 2011
Nobuaki Ozaki; Yoshihiro Yasuda; Yoshiki Saito; Daisuke Ikebuchi; Masayuki Kimura; Hideharu Amano; Hiroshi Nakamura; Kimiyoshi Usami; Mitaro Namiki; Masaaki Kondo
Cool Mega-Array (CMA) is an energy-efficient reconfigurable accelerator for battery-driven mobile devices. It has a large processing-element array without memory elements for mapping an applications data-flow graph, a simple programmable microcontroller for data management, and data memory. Unlike coarse-grained dynamically reconfigurable processors, CMA reduces power consumption by switching hardware context and storing intermediate data in registers.
asia and south pacific design automation conference | 2010
Daisuke Ikebuchi; Naomi Seki; Yu Kojima; M. Kamata; Lei Zhao; Hideharu Amano; Toshiaki Shirai; Satoshi Koyama; Tatsunori Hashida; Y. Umahashi; Hiroki Masuda; Kimiyoshi Usami; Seidai Takeda; Hiroshi Nakamura; Mitaro Namiki; Masaaki Kondo
Geyser-1 is a MIPS CPU which provides a fine-grained run-time power gating (PG) controlled by instructions. Unlike traditional PGs, it uses special standard cells in which the virtual ground (VGND) is separated from the real ground, and a certain number of the sleep transistors are inserted for quick power shut-down and wake-up. In Geyser-1, the fine-grained run-time PG is applied to computational modules in the execution stage. The power shut-down and wakeup are controlled with architectural and software level. This implementation is the first available CPU with this type of run-time PG technique. Geyser-1 has both time and spatial fine-grained PG and works well with a real chip.