Tomoyuki Nakabayashi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tomoyuki Nakabayashi is active.

Explore More

Publication

Featured researches published by Tomoyuki Nakabayashi.

international symposium on computing and networking | 2013

FabCache: Cache Design Automation for Heterogeneous Multi-core Processors

Takaki Okamoto; Tomoyuki Nakabayashi; Takahiro Sasaki; Toshio Kondo

Single-ISA heterogeneous multi-core architecture which consists of diverse superscalar cores is increasing importance in the processor architecture. Using a proper superscalar core for characteristic in a program contributes to reduce energy consumption and improve performance. However, designing a single-ISA heterogeneous multi-core processor requires a large design and verification effort which is multiplied by the number of different core types. In addition, cache systems for each superscalar core and a shared bus system to connect between differently-designed caches also cause increase in the design effort. In particular, a heterogeneous multi-core processor may have various cache structures, such as only L1 cache, L1 and unified L2 caches, L1 and dedicated L2 caches and each cache differs in cache dimensions, i.e., cache capacity, line size, and associativity. Therefore, we have proposed FabHetero to solve this problem. FabHetero is a framework to generate diverse heterogeneous multi-core processors automatically using Fab-Scalar, FabCache, and FabBus which generate various designs of superscalar core, cache system, and flexible shared bus system, respectively. This paper presents the detail of FabCache. We show that the caches which have arbitrary parameters such as cache capacity, line size, associativity, access latency, and line transmission width between cache hierarchies generated by FabCache work correctly.

Iet Computers and Digital Techniques | 2012

Design and evaluation of variable stages pipeline processor with low-energy techniques

Tomoyuki Nakabayashi; T. Sasak; I K. Ohno; Toshio Kondo

Enhancement of mobile computers requires high-performance computing with low-energy consumption. Variable stages pipeline (VSP) architecture, which reduces energy consumption and improves execution time by dynamically unifying the pipeline stages, is proposed to achieve this requirement. A VSP processor uses a special pipeline register called a latch D-flip-flop selector-cell (LDS-cell) that unifies the pipeline stages and prevents glitch propagation caused by stage unification under low-energy mode. The design of the fabricated VLSI of a VSP processor chip on 0.18 m CMOS technology is presented. An evaluation shows that the VSP processor consumes 13 less energy than a conventional one.

asia and south pacific design automation conference | 2014

Co-simulation framework for streamlining microprocessor development on standard ASIC design flow

Tomoyuki Nakabayashi; Tomoyuki Sugiyama; Takahiro Sasaki; Eric Rotenberg; Toshio Kondo

In this paper, we present a practical processor co-simulation framework for not only RTL simulation but also gate/transistor level simulation, and even chip evaluation with an LSI tester. Our framework includes an off-chip system call emulation mechanism, which handles system calls to evaluate and verify the processor design with general benchmark programs without pseudo-circuits in the processor design. Therefore, our framework can be consistently used from RTL design to chip fabrication. We also propose a checkpoint mechanism that resumes a program from a pre-created checkpoint. This mechanism is not affected by the non-deterministic problem on a multi-core processor. Moreover, we propose a cache warming mechanism when resuming from a checkpoint.

International Journal of Computer and Electrical Engineering | 2013

Reducing Dynamic Energy of Variable Level Cache

Ko Watanabe; Takahiro Sasaki; Tomoyuki Nakabayashi; Kazuhiko Ohno; Toshio Kondo

Today, a high-performance and low-power processor is required. Even on an embedded processor, a large cache is implemented to achieve high performance, and leakage energy in such large cache has been increasing caused by scale down of semiconductor device. Therefore, reduction of cache leakage energy is very important. To reduce cache leakage energy, we propose a variable level cache (VLC) which reduces leakage energy with a little performance degradation. Generally, a required size of a cache depends on a program behavior. VLC dynamically estimates whether the current cache size is suitable for a running program, and if not VLC modifies its cache structure and cache hierarchy to change cache capacity. Since changing the cache construction and hierarchy incurs a large overhead, VLC adopts the low-overhead technique which reduces hierarchy changing overhead to prevent performance from degrading. However, previous VLC has a problem that dynamic energy consumption is increased because the technique needs many futile accesses. To solve the problem, this paper proposes the novel technique which reduces the number of the futile accesses to reduce dynamic energy consumption. According to our simulation results, the proposed VLC technique reduces 18% dynamic energy without performance degradation compared with the previous VLC.

international symposium on computing and networking | 2014

Detail Design and Evaluation of Fab Cache

Takaki Okamoto; Tomoyuki Nakabayashi; Takahiro Sasaki; Toshio Kondo

Single-ISA heterogeneous multi-core architecture which consists of diverse superscalar cores is increasing importance in the processor architecture. Using a proper superscalar core for characteristic in a program contributes to reduce energy consumption and improve performance. However, designing a heterogeneous multi-core processor requires a large design and verification effort. Therefore, we have proposed FabHetero which generates diverse heterogeneous multi-core processors automatically using FabScalar, FabCache, and FabBus which generate various designs of superscalar core, cache system, and flexible shared bus system, respectively. This paper is extended from our previous work, and it also presents the detail of FabCache. In the previous paper, the detail design of L1 data cache is not described, and the mechanism for high-end performance such as non-blocking cache is not implemented. In addition, the physical design and power estimation are not described. To solve these problems, this paper describes detail design of FabCache, in particular L1 data cache to show the suitability for high-end processors. This paper also focuses on performance estimation and the physical design of the caches which have arbitrary parameters such as cache capacity, line size, associativity, access latency, and line transmission width between cache hierarchies generated by FabCache. According to the estimation results, FabCache generates cache systems which have almost the same area and power consumption as hand-tuned cache because the ratio of L1 instruction cache controller including extra circuits is only 3.5% and the increased power consumption by comparing with hand-tuned cache is less than 0.1% although having the overhead of automatic generation.

international symposium on computing and networking | 2013

Dynamic BTB Resizing for Variable Stages Superscalar Architecture

Tomoyuki Nakabayashi; Takahiro Sasaki; Toshio Kondo

To extract instruction level parallelism (ILP) and thread level parallelism (TLP), super scalar architecture has become commonly used for high-performance computers. While a deeper super scalar pipeline achieves a higher performance, it consumes a larger energy consumption. For the energy reduction of a deeply-pipelined processor, we have proposed a variable stage pipeline (VSP) architecture which reduces the energy consumption by dynamically unifying the pipeline stages according to behavior in a program. Because the pipeline structure alters after pipeline unification, hardware for extracting ILP and TLP also should be resized to balance the energy-performance trade-off. In this paper, we propose a dynamic branch target buffer (BTB) resizing technique into VSP implemented on a super scalar processor to reduce further energy consumption when the VSP unifies the pipeline stages. The proposed technique resizes the size of the BTB along with pipeline scaling. Our evaluation results show that using the proposed technique can reduce the BTB size to one-eight after pipeline unification with only 0.02% prediction accuracy degradation on the average compared with the baseline BTB. This results in 9.2% dynamic energy reduction of the processor core with a trivial performance loss. Furthermore, our technique reduces the leakage energy consumption in the BTB by 87.5% with a practical leakage control technique.

Iet Computers and Digital Techniques | 2013

Design and evaluation of fine-grain-mode transition method based on dynamic memory access analysing for variable stages pipeline processor

Takahiro Sasaki; Tomoyuki Nakabayashi; Kazumasa Nomura; Kazuhiko Ohno; Toshio Kondo

This study proposes a fine-grain-mode transition method for variable stages pipeline (VSP) processor. The method is based on dynamic memory access analysing and it reduces energy consumption. A VSP processor varies the pipeline depth dynamically according to workload. When the workload is heavy, the processor shifts into a high-speed mode that drives a deep pipeline at a high clock frequency. When the workload is light, the processor shifts into a low-energy mode that unifies pipeline stages to make the pipeline shallower and drives it at a low clock frequency. The conventional mode transition method cannot follow sharp workload changes because it takes a long time to predict workload. The fine-grain pipeline depth control, this study proposes, is based on a high-speed workload prediction mechanism using memory access frequency, and it uses a novel method to conceal the overhead because of changing the pipeline depth. Simulation results show that the authors approach can reduce the energy-delay product 10% below what it would be with the conventional approach.

international conference on networking and computing | 2012

Measurement of Low-Energy Processor Chip Using Fine-Grain Variable Stages Pipeline Architecture

Tomoyuki Nakabayashi; Takahiro Sasaki; Kazuhiko Ohno; Toshio Kondo

Increase of energy consumption caused by processor enhancement has recently become a serious problem. Dynamic voltage and frequency scaling (DVFS) which dynamically lowers the supply voltage and clock frequency is widely used to reduce energy consumption. However, it is difficult to deliver fine-grain energy optimization by using DVFS because a voltage regulator takes a long time for scaling the voltage. To reduce energy consumption at fine-grain interval, we propose a variable stages pipeline (VSP) processor. VSP reduces energy consumption by dynamically varying the pipeline depth to suitable pipeline depth according to behavior of a running program. VSP can optimize energy at finer-grain than DVFS because pipeline scaling has a small overhead. In this paper, we fabricated a VSP processor chip using 180 nm technology and evaluated energy consumption of the chip. We present that the fabricated VSP chip dynamically varies the pipeline depth while a program is running and reduces the energy consumption at shorter interval than DVFS.

international soc design conference | 2011

Low power semi-static TSPC D-FFs using split-output latch

Tomoyuki Nakabayashi; Takahiro Sasaki; Kazuhiko Ohno; Toshio Kondo

D-FFs play an important role in CMOS digital circuits, because the delay, area and power consumption of D-FFs significantly affect the performance of VLSI chips. We propose two types of semi-static TSPC D-FFs using split-output latch which improve the HSTSPC D-FF. One is a double split-output semi-static TSPC D-FF (DSSTSPC D-FF), which is a speed-efficient design, and the other is a single split-output semi-static TSPC D-FF (SSSTSPC D-FF), which is a power and area-efficient design. The former achieves 4% less the delay than the HSTSPC D-FF. The latter achieves 31% smaller area and 30% lower power consumption than the conventional D-FF.

asia and south pacific design automation conference | 2011

Design and evaluation of variable stages pipeline processor chip

Tomoyuki Nakabayashi; Takahiro Sasaki; Kazuhiko Ohno; Toshio Kondo

In order to reduce the energy consumption in high performance computing, variable stages pipeline processor (VSP) is proposed, which improves execution time by dynamically unifying the pipeline stages. The VSP adopts a special pipeline register called an LDS-cell that unifies the pipeline stages and prevents glitch propagation. We fabricate the VSP chip on a Rohm 0.18μm CMOS process and evaluate the energy consumption. The result indicates the VSP can achieve 13% less energy consumption than the conventional approach.

Explore More