Masayoshi Mase
Waseda University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Masayoshi Mase.
international solid-state circuits conference | 2008
Masayuki Ito; Toshihiro Hattori; Yutaka Yoshida; Kiyoshi Hayase; Tomoichi Hayashi; Osamu Nishii; Yoshihiko Yasu; Atsushi Hasegawa; Masashi Takada; Hiroyuki Mizuno; Kunio Uchiyama; Toshihiko Odaka; Jun Shirako; Masayoshi Mase; Keiji Kimura; Hironori Kasahara
Power efficient SoC design for embedded applications requires several independent power-domains where the power of unused blocks can be turned off. An SoC for mobile phones defines 23 hierarchical power domains but most of the power domains are assigned for peripheral IPs that mainly use low-leakage high-Vt transistors. Since high-performance multiprocessor SoCs use leaky low-Vt transistors for CPU sections, leakage power savings of these CPU sections is a primary objective. We develop an SoC with 8 processor cores and 8 user RAMs (1 per core) targeted for power-efficient high-performance embedded applications. We assign these 16 blocks to separate power domains so that they can be independently be powered off. A resume mode is also introduced where the power of the CPU is off and the user RAM is on for fast resume operation. An automatic parallelizing compiler schedules tasks for each CPU core and also performs power management for each CPU core. With the help of this compiler, each processor core can operate at a different frequency or even dynamically stop the clock to maintain processing performance while reducing average operating power consumption. The compiler also executes power-off control of unnecessary CPU cores.
languages and compilers for parallel computing | 2009
Keiji Kimura; Masayoshi Mase; Hiroki Mikami; Takamichi Miyamoto; Jun Shirako; Hironori Kasahara
OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled “Multicore Technology for Realtime Consumer Electronics.” By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.
languages and compilers for parallel computing | 2010
Akihiro Hayashi; Yasutaka Wada; Takeshi Watanabe; Takeshi Sekiguchi; Masayoshi Mase; Jun Shirako; Keiji Kimura; Hironori Kasahara
Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridges a gap between programmers and heterogeneous multicores. In particular, this paper describes the compilation framework based on OSCAR compiler. It realizes coarse grain task parallel processing, data transfer using a DMA controller, power reduction control from user programs with DVFS and clock gating on various heterogeneous multicores from different vendors. This paper also evaluates processing performance and the power reduction by the proposed framework on a newly developed 15 core heterogeneous multicore chip named RP-X integrating 8 general purpose processor cores and 3 types of accelerator cores which was developed by Renesas Electronics, Hitachi, Tokyo Institute of Technology and Waseda University. The framework attains speedups up to 32x for an optical flow program with eight general purpose processor cores and four DRP(Dynamically Reconfigurable Processor) accelerator cores against sequential execution by a single processor core and 80% of power reduction for the real-time AAC encoding.
asia and south pacific design automation conference | 2008
Hiroaki Shikano; Masaki Ito; Kunio Uchiyama; Toshihiko Odaka; Akihiro Hayashi; Takeshi Masuura; Masayoshi Mase; Jun Shirako; Yasutaka Wada; Keiji Kimura; Hironori Kasahara
A heterogeneous multi-core processor (HMCP) architecture, which integrates general purpose processors (CPU) and accelerators (ACC) to achieve high-performance as well as low-power consumption with the support of a parallelizing compiler, was developed. The evaluation was performed using an MP3 audio encoder on a simulator that accurately models the HMCP. It showed that 16-frame encoding on the HMCP with four CPUs and four ACCs yielded 24.5-fold speed-up of performance against sequential execution on one CPU. Furthermore, power saving by the compiler reduced energy consumption of the encoding to 0.17 J, namely, by 28.4%.
languages and compilers for parallel computing | 2011
Hiroki Mikami; Shumpei Kitaki; Masayoshi Mase; Akihiro Hayashi; Mamoru Shimaoka; Keiji Kimura; Masato Edahiro; Hironori Kasahara
This paper evaluates an automatic power reduction scheme of OSCAR automatic parallelizing compiler having power reduction control capability when multiple media applications parallelized by the OSCAR compiler are executed simultaneously on RP2, a 8-core multicore processor developed by Renesas Electronics, Hitachi, and Waseda University. OSCAR compiler enables the hierarchical multigrain parallel processing and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating and power gating for each processor core using the OSCAR multi-platform API. The RP2 has eight SH4A processor cores, each of which has power control mechanisms such as DVFS, clock gating and power gating. First, multiple applications with relatively light computational load are executed simultaneously on the RP2. The average power consumption of power controlled eight AAC encoder programs, each of which was executed on one processor, was reduced by 47%, (to 1.01W), against one AAC encoder execution on one processor (from 1.89W) without power control. Second, when multiple intermediate computational load applications are executed, the power consumptions of an AAC encoder executed on four processors with the power reduction control was reduced by 57% (to 0.84W) against an AAC encoder execution on one processor (from 1.95W). Power consumptions of one MPEG2 decoder on four processors with power reduction control was reduced by 49% (to 1.01W) against one MPEG2 decoder execution on one processor (from 1.99W). Finally, when a combination of a high computational load application program and an intermediate computational load application program are executed simultaneously, the consumed power reduced by 21% by using twice number of cores for each application. This paper confirmed parallel processing and power reduction by OSCAR compiler are efficient for multiple application executions. In execution of multiple light computational load applications, power consumption increases only 12% for one application. Parallel processing being applied to intermediate computational load applications, power consumption of executing one application on one processor core (1.49W) is almost same power consumption of two applications on eight processor cores (1.46W).
Archive | 2010
Hironori Kasahara; Keiji Kimura; Masayoshi Mase
Archive | 2010
Hironori Kasahara; Keiji Kimura; Masayoshi Mase
international symposium on parallel and distributed processing and applications | 2008
Takamichi Miyamoto; Saori Asaka; Hiroki Mikami; Masayoshi Mase; Yasutaka Wada; Hirofumi Nakano; Keiji Kimura; Hironori Kasahara
IPSJ SIG Notes | 2007
Masayoshi Mase; Daisuke Baba; Harumi Nagayama; Hiroaki Tano; Takeshi Masuura; Takamichi Miyamoto; Jun Shirako; Hirofumi Nakano; Keiji Kimura; Tatsuya Kamei; Toshihiro Hattori; Atsushi Hasegawa; Masaki Ito; Makoto Sato; Kunio Uchiyama; Toshihiko Odaka; Hironori Kasahara
computer software and applications conference | 2017
Hironori Kasahara; Keiji Kimura; Boma A. Adhi; Yuhei Hosokawa; Yohei Kishimoto; Masayoshi Mase